Skip to main content

Storage

1. Why this chapter exists

A running zcashd writes several kinds of state to disk, and a reader who confuses one for another will mis-diagnose every operational problem. The chainstate has different durability properties from the wallet, the wallet has different durability properties from the optional indexes, and mempool.dat is not the same thing as the mempool in memory. This chapter draws the map and shows the commands to inspect each store.

The split below mirrors the access pattern: protocol storage (must agree byte-for-byte with every other validating node), indexing storage (optional, local, off by default, re-buildable), wallet storage (single-user secrets), and operational storage (peers, banlist, debug log).

2. Definitions

Definition 8.1 (Datadir). The single directory that holds all node state, default ~/.zcash on Linux. Overridable with -datadir=<path> on the command line. Network-specific state lives under testnet3/ or regtest/ subdirectories.

Definition 8.2 (Chainstate). The LevelDB database <datadir>/chainstate/ containing the UTXO set, per-pool nullifier sets, per-pool anchor sets, the per-block note commitment tree roots, and the best-block pointer. It is the authoritative consensus state of a synced node.

Definition 8.3 (Block index). The LevelDB database <datadir>/blocks/index/ mapping each block hash to a CBlockIndex record (height, file offset, status flags, cumulative work). Lets the node find a block on disk by hash without scanning every blk?????.dat file.

Definition 8.4 (Index databases). Optional secondary LevelDB databases (<datadir>/blocks/index/ already exists by default; others are off by default and enabled per-flag) that map external queries (txid -> location, address -> transaction list, timestamp -> block) to chain data. Indexes are local; they are not consensus; they can be rebuilt from chain data with -reindex.

Definition 8.5 (Wallet file). <datadir>/wallet.dat. A BerkeleyDB 6.2 database holding the user's keys, addresses, transaction history, witnesses, and HD seed. Per-user secret. Never shared across nodes.

3. The code and the on-disk layout

Datadir layout (mainnet)

~/.zcash/
blocks/
blk00000.dat # raw serialised blocks, append-only
blk00001.dat
...
rev00000.dat # undo data for block disconnects
rev00001.dat
...
index/ # LevelDB; block hash -> CBlockIndex
chainstate/ # LevelDB; UTXO + nullifiers + anchors
database/ # BDB log files for the wallet
wallet.dat # BDB; the legacy wallet
banlist.dat # serialised CBanEntry list
peers.dat # serialised addrman state
mempool.dat # mempool snapshot (loaded on startup)
fee_estimates.dat # serialised fee estimator state
zcash.conf # operator's config file (READ at startup)
zcashd.pid # PID of the running daemon
debug.log # main log
.cookie # RPC auth cookie (random per run, mode 0600)
.lock # filesystem lock so two daemons cannot share

Testnet adds a testnet3/ prefix; regtest uses regtest/. The shared params at ~/.zcash-params/ (Sprout-Groth16, Sapling proving and verifying keys) are SEPARATE from the datadir and shared across all networks; they are downloaded once by zcutil/fetch-params.sh.

Protocol storage

This is the storage the protocol requires every node to keep consistent. Lose it or corrupt it and the node cannot validate new blocks.

Raw block files: blocks/blk?????.dat

Append-only flat files holding serialised blocks in chain order as they were received. Each file caps at ~128 MiB; once full, a new blkNNNNN.dat is started. The block index records the (file, offset) of each block so that random access is one seek.

src/main.cpp (FindBlockPos: block file allocation)
loading...

Companion rev?????.dat files hold undo data: the inputs that each block consumed, plus per-pool tree state needed to rewind on DisconnectBlock. They are essential for reorgs.

Inspect:

ls -lh ~/.zcash/blocks/blk*.dat
# Raw bytes; not directly useful. Use RPC:
zcash-cli getblock <hash> 0 # raw hex
zcash-cli getblock <hash> 1 # parsed JSON
zcash-cli getblockcount
zcash-cli getbestblockhash

Block index: blocks/index/

LevelDB. Keys begin with b followed by the block hash; values are serialised CBlockIndex records. Read via CBlockTreeDB::ReadBlockIndex.

src/txdb.h (CBlockTreeDB, ReadBlockIndex)
loading...

The block index also stores: file information (f prefix), the "reindex needed" flag (R), the last block file number (l), and the optional indexes' enable flags.

Chainstate: chainstate/

LevelDB. The consensus-critical state. Keys, with byte prefixes:

'C' + COutPoint -> Coin (the transparent UTXO entry)
'B' -> uint256 (best block hash)

# Per-pool sets (Sprout, Sapling, Orchard):
's' + nullifier -> 1 (Sprout nullifier set)
'S' + anchor -> SaplingMerkleTree (Sapling anchor -> stored tree state)
'O' + nullifier -> 1 (Sapling nullifier set; varies by version)
... etc

Exact prefix bytes are defined in src/txdb.cpp; look for the DB_* constants:

src/txdb.cpp (DB_* prefix constants)
loading...

The chainstate is read through CCoinsViewDB (the LevelDB backing) layered under CCoinsViewCache (the in-memory cache):

src/coins.h (CCoinsView, CCoinsViewCache, CCoinsViewDB)
loading...

Note commitment tree roots (see chapter 07) are stored alongside the UTXO entries: the chainstate is the single store responsible for the entire post-block consensus state.

Inspect:

# High-level summary; iterates the entire UTXO set so SLOW on mainnet
zcash-cli gettxoutsetinfo

# Pool balances (Sprout / Sapling / Orchard)
zcash-cli getblockchaininfo | jq '.valuePools'

# Per-block roots
zcash-cli getblock <hash> 1 | jq '.finalsaplingroot, .finalorchardroot'

# Raw LevelDB inspection (read-only; do not run against a running node)
zcashd -datadir=/path/to/snapshot -reindex-chainstate

Direct LevelDB scans require the daemon to be stopped (the exclusive lock is held while running). The Python plyvel library plus the src/txdb.cpp prefix table lets you build a scanner; ZODL has not (yet) published a vetted tool.

Sapling and Orchard parameters: ~/.zcash-params/

NOT in the datadir. Downloaded once by zcutil/fetch-params.sh:

~/.zcash-params/
sprout-groth16.params (884 MB)
sapling-spend.params (47 MB)
sapling-output.params (3.4 MB)

Loaded at daemon startup by librustzcash_init_zksnark_params. A node that lacks them refuses to start.

Inspect:

sha256sum ~/.zcash-params/*
# Expected hashes are published; see zcutil/fetch-params.sh for the
# canonical list.

Note-commitment-tree state

The Sapling and Orchard frontiers (the rightmost-path representations described in chapter 07) are stored INSIDE the chainstate under their own key prefixes. They are not a separate file; they piggyback on chainstate/ because they evolve in lockstep with each block.

Indexing storage

Optional, off by default, never consensus. Indexes accelerate queries that map external identifiers to chain data. None of these are required for validation; rebuild with -reindex or -reindex-chainstate if lost.

Transaction index (-txindex)

When on, maintains a LevelDB index under blocks/index/ mapping txid -> (block file, offset, length). Without -txindex, the node can only retrieve transactions that are in the mempool or in the wallet.

# Enable in zcash.conf
txindex=1
# Then restart and let it rebuild (slow first time)

# Required for:
zcash-cli getrawtransaction <txid> # by hash, any chain location

Programmatic write path:

src/txdb.cpp (CBlockTreeDB read/write methods)
loading...

Address index (-addressindex)

Maps transparent script hash -> list of (height, txid, vout, value). Required for the explorer-style RPCs getaddresstxids, getaddressbalance, getaddressutxos, getaddressdeltas. Off by default; large on mainnet (tens of GiB).

The on-disk types are declared at src/addressindex.h:

src/addressindex.h (CAddressIndexKey, CAddressIndexIteratorKey)
loading...

Spent index (-spentindex)

Maps an outpoint to the transaction that spent it. Required for getspentinfo. Same backing store as addressindex. Useful for forensic tools that need to walk "where did this coin go".

Timestamp index (-timestampindex)

Maps timestamps to block hashes for fast "block at time T" lookups. Used by some block explorers.

Insight-style indexes

Several of the above were imported from Bitcoin's Bitcore / Insight forks. Their on-disk format is documented in src/addressindex.h, src/spentindex.h, and src/timestampindex.h.

Reindex flow

zcashd -reindex # rebuild block index AND chainstate
zcashd -reindex-chainstate # rebuild only chainstate (faster)

-reindex reads every blk?????.dat from scratch; on mainnet expect several hours.

Wallet storage

The wallet is a separate persistence layer, owned by the user, on the same machine but conceptually independent.

wallet.dat (BerkeleyDB 6.2)

A typed key-value store. Each record is a tagged tuple:

"hdseed" -> the master HD seed (encrypted if wallet is encrypted)
"key" -> a transparent (private, public) keypair
"sapext" -> a Sapling extended spending key
"orchard_*" -> Orchard keys (varies)
"tx" -> a CWalletTx record
"name" -> a label for an address
"acentry" -> an account entry (legacy)
"flags" -> wallet feature flags (encrypted, HD, etc.)
"version" -> on-disk wallet version
"mkey" -> a wrapped key (when wallet is passphrase-encrypted)
"ckey" -> a crypted private key
"witnesscache_v3" -> serialised Sapling/Orchard witness cache
... and many more

The full key tag list is in src/wallet/walletdb.cpp:

src/wallet/walletdb.cpp (CWalletDB record tags)
loading...

The serialised wallet types live in src/wallet/walletdb.h:

src/wallet/walletdb.h (class CWalletDB)
loading...

BDB on-disk realities

BerkeleyDB 6.2 maintains a <datadir>/database/ directory of log files alongside wallet.dat. These logs must be preserved together; copying only wallet.dat while the daemon is running gives a partial state. Always stop the daemon, or use the "backup wallet" RPC:

zcash-cli backupwallet /tmp/backup-wallet.dat

backupwallet performs a transactional checkpoint and writes a self-contained copy.

Encryption

A passphrase-encrypted wallet protects the secret records (spending keys, HD seed). Public material (addresses, viewing keys, transaction metadata) is plaintext on disk. This is by design: the node must be able to scan for incoming notes without the passphrase.

The encryption flow is in src/wallet/crypter.{h,cpp}: AES-256-CBC over a passphrase-derived key (iterated SHA-512).

Witness data

For each unspent Sapling and Orchard note, the wallet stores an authentication path from the note's commitment to a recent anchor. The serialised form lives under the witnesscache_v3 key (and earlier v2, v1 for legacy data); the in-memory representation is in src/rust/src/wallet.rs.

Witnesses are derivable from chain data; a wallet that loses them can rebuild by rescanning (-rescan). They are stored explicitly because rescanning the chain is slow.

Inspect

zcash-cli dumpwallet /tmp/wallet-dump.txt # human-readable secrets export
zcash-cli z_exportwallet /tmp/zwallet.txt # shielded keys included
zcashd-wallet-tool ... # offline migrations

For raw BDB inspection (daemon stopped):

db_dump -p wallet.dat | less
# Note: BDB on-disk format is opaque without zcashd-side knowledge.
# Prefer dumpwallet / z_exportwallet wherever possible.

Operational storage

peers.dat

Serialised CAddrMan state: the new/tried tables of known peer addresses. Written periodically and on shutdown by CAddrDB::Write:

src/addrdb.h (class CAddrDB)
loading...

Lose it and the node will rebootstrap from DNS seeds on next start.

banlist.dat

Serialised list of banned peers (with expiry times). Written by CAddrDB::DumpBanlist.

mempool.dat

Snapshot of the mempool at shutdown. Loaded on startup so that a restarted node does not lose its mempool. Format: serialised list of (timestamp, fee_delta, CTransaction). Saved every 15 minutes and on clean shutdown.

zcash-cli getmempoolinfo
zcash-cli savemempool # force a snapshot

fee_estimates.dat

Serialised state of the fee estimator (an EMA of confirmation times at various fee rates). Used by estimatefee. Rebuilt over time; not critical.

debug.log

The main log. Default location <datadir>/debug.log. Rotated by restart only (no built-in log rotation). Verbosity controlled by -debug=<category> flags; the Rust subsystems route through tracing and are bridged into the C++ logger in src/rust/src/tracing_ffi.rs.

Per-run RPC auth cookie. Mode 0600. Used by zcash-cli to authenticate when no -rpcuser/-rpcpassword is set.

.lock

Empty file held by flock(). Prevents two zcashd processes from sharing one datadir.

A summary table

PathLayerPurposeFormatSurvives across versions?
blocks/blk*.datProtocolRaw serialised blocksCustom binaryYes
blocks/rev*.datProtocolUndo data for reorgsCustom binaryYes
blocks/index/ProtocolBlock-hash -> location indexLevelDBRe-buildable with -reindex
chainstate/ProtocolUTXOs, nullifiers, anchors, tree rootsLevelDBRe-buildable with -reindex-chainstate
~/.zcash-params/ProtocolSapling/Sprout proving + verifying keysCustom binary (Groth16 keys)Yes; tied to MPC ceremony output
blocks/index/ (with -txindex)Indexingtxid -> locationLevelDBRe-buildable
same backing (-addressindex)Indexingscript -> txesLevelDBRe-buildable
same backing (-spentindex)Indexingoutpoint -> spenderLevelDBRe-buildable
same backing (-timestampindex)Indexingtime -> blocksLevelDBRe-buildable
wallet.dat + database/WalletKeys, addresses, txes, witnessesBerkeleyDB 6.2Yes; user-owned
peers.datOperationalAddrman stateSerialised CAddrManRe-buildable from DNS seeds
banlist.datOperationalBanned peersSerialised CBanEntryRe-buildable
mempool.datOperationalMempool snapshot at shutdownSerialised tx listRe-buildable
fee_estimates.datOperationalFee estimator EMASerialisedRe-buildable
.cookieOperationalRPC authRandom 32 bytesNo; per-run
debug.logOperationalMain logPlain textNo
.lockOperationalProcess lockEmptyNo

4. Failure modes

  • Corrupting chainstate/ mid-write. A crash during a flush can leave the database inconsistent. zcashd attempts to detect this at startup and asks the operator to -reindex-chainstate. Caught by: the on-startup integrity check in CChainStateBlockHeaderTreeDB::LoadBlockIndexGuts. No automated test in this workspace; caught operationally.
  • Backing up wallet.dat while the daemon is running. Partial BDB state. Caught by: BDB refusing to open or returning bogus data on restore. Always use backupwallet.
  • Mixing chainstate from different network upgrades. A chainstate built before NU5 cannot validate blocks after NU5 without a reindex (the Orchard anchor set does not exist in the old chainstate). Caught by: assertion failure on the first v5 transaction.
  • Enabling -txindex after the fact without -reindex. Some RPCs will return "no such txid" for historical transactions. Caught by: getrawtransaction failures.
  • Disk-full during block write. Block file rotation can fail; the node halts with AbortNode("Disk space too low!"). Caught by: an explicit disk-space check before each block write.
  • Losing peers.dat. Node falls back to DNS seeds. Slow reconnect. Not catastrophic.
  • Losing the wallet but keeping the seed. Recoverable for transparent keys (re-derive via BIP-32). Recoverable for shielded keys (re-derive via ZIP-32, then -rescan to reconstruct witnesses). The HD seed is the single load-bearing secret; protect it.

5. Spec pointers

The protocol specification is silent on storage formats: storage is an implementation concern. Cross-implementation interop happens at the wire level. Useful pointers:

6. Exercises

  1. Estimate datadir size. On a fully synced mainnet node, measure each subdirectory:

    du -sh ~/.zcash/blocks ~/.zcash/chainstate ~/.zcash/wallet.dat

    Roughly what fraction is each? Answer (typical 2026 mainnet): blocks/ dominates (raw blocks), chainstate/ is much smaller, wallet.dat depends on the user.

  2. Trace a block on disk. Pick a recent block hash via zcash-cli getbestblockhash. Use zcash-cli getblock <hash> 1 to find the height; cross-check the file-and-offset by reading the block-index entry via the debug RPC if your build supports it, otherwise via a custom LevelDB read against blocks/index/.

  3. UTXO set summary. Run zcash-cli gettxoutsetinfo on a synced node. Note the total transactions, txouts, and total_amount. The latter must equal the issued subsidy minus any value locked in shielded pools (which is reported separately in valuePools).

  4. Index toggle. Stop the daemon. Add txindex=1 to zcash.conf. Restart with -reindex and time the rebuild. Confirm getrawtransaction <historical_txid> works afterward.

  5. Wallet backup round-trip. Run backupwallet /tmp/wallet.dat.backup. Stop the daemon. Move the original wallet.dat aside. Copy the backup into place. Start. Verify getbalance returns the expected value and z_listaddresses returns the same set.

  6. Modification exercise. Add a getstoragelayout RPC that returns a JSON object enumerating each of the on-disk artefacts in section 3 with their current size. Pattern is in src/rpc/misc.cpp; the size lookup uses boost::filesystem::file_size.

7. Further reading

  • Bitcoin Core's doc/files.md for a more complete description of the upstream datadir conventions zcashd inherits.
  • The LevelDB implementation notes.
  • BerkeleyDB 6.2 reference for the wallet format (the legacy format will eventually be replaced; see chapter 09).