16 - Pedersen hash deep dive
1. Why this chapter exists
Pedersen hash is the single most-evaluated cryptographic operation
in Sapling: every note commitment, every Merkle-tree hash, every
value commitment runs through it. A contributor who cannot derive
the windowed encoding , name the per-segment
generator construction, and trace the in-circuit constraint count
will not be able to review a circuit-level PR, debug a hash
mismatch, or understand why
zcash_proofs/src/circuit/sprout/commitment.rs
uses SHA-256 (Sprout legacy) while Sapling uses Pedersen. The
chapter fills in the full construction; chapter 04 only sketched
it. The implementation lives in the external sapling-crypto
crate (sapling_crypto::pedersen_hash and
sapling_crypto::circuit::pedersen_hash).
2. Definitions
Definition 2.1 (The Jubjub setting). Let denote the prime-order subgroup of the twisted Edwards curve Jubjub over the BLS12-381 scalar field . Its order is with and the cofactor is . Throughout the chapter the Pedersen hash takes a bit string and a domain-separation tag and outputs a point of . We write for the -th generator associated with domain .
Definition 2.2 (Windowed encoding ). Given a 3-bit chunk , define
Explicitly:
This is the Booth-encoded representation of a 3-bit window with a sign bit. In-circuit, doubling the partial sum costs the same as adding it; the encoding minimises the number of base doublings needed.
Definition 2.3 (Segment scalar). Fix a segment length of chunks ( input bits). For chunks , the segment scalar is the integer
The spacing , rather than , is forced because each window contributes a value in which needs four bits to encode unambiguously. The segment scalar lies in the interval
which is comfortably contained in the residue system modulo .
Definition 2.4 (Per-segment generator). For domain and segment index , the generator is defined by the try-and-increment hash-to-curve construction
where on each attempt the candidate
is interpreted as a candidate -coordinate. If a valid exists with on the curve, the resulting point is multiplied by the cofactor to land in ; otherwise is incremented. The construction is deterministic and yields a point of uniform up to a negligible bias.
Definition 2.5 (Pedersen hash). Let be a domain tag and let be a bit string padded by zeros to a multiple of bits. Write and let denote the -th 3-bit chunk of the -th segment. The Pedersen hash is
Theorem 2.6 (Collision resistance reduces to DLP on ). Let be fixed and let the generators be sampled by Definition 2.4. Any algorithm that finds with and can be converted, with the same advantage and a polynomial overhead, into an algorithm that computes a non-trivial discrete-log relation among in .
Proof sketch. A collision yields integer differences $\Delta_j = \langle!\langle c_{j,\bullet}^{(1)} \rangle!\rangle
- \langle!\langle c_{j,\bullet}^{(2)} \rangle!\rangle|\Delta_j| < \ell_J\sum_j [\Delta_j] G_j^{(D)} = \mathcal{O}|\Delta_j| < \ell_J\ell_J\mathbb{G}_J$ recovers the relation; conversely the relation gives a non-trivial linear dependence among generators whose pairwise discrete logs are secret. See Hopwood et al., Sapling design notes, §5.4.1.7 for the full reduction.
3. The code
3.1 The Sapling Pedersen-hash flow lives in sapling-crypto
The Sapling implementation is in the external sapling-crypto
crate. librustzcash consumes it via re-exports:
sapling_crypto::pedersen_hash::pedersen_hash: out-of-circuit, used by the wallet for trial decryption and witness construction.sapling_crypto::circuit::pedersen_hash::pedersen_hash: the Bellman gadget used inside the Spend / Output Groth16 circuit.sapling_crypto::constants: precomputed generator sets per domain.sapling_crypto::primitives::NoteCommitment: high-level note commitment using Pedersen hash plus commitment randomness.sapling_crypto::primitives::MerkleHash: the Merkle-tree hash.
3.2 Comparison: Sprout commits with SHA-256, not Pedersen
For context, here is the Sprout note_comm gadget that lives in
this repo. It uses SHA-256 with a 4-bit type tag prefix
( = 1011_0000), not Pedersen hash:
loading...
The Sapling improvement is replacing this with the algebraic Pedersen hash, which reduces the in-circuit constraint count by about an order of magnitude for the same input length.
3.3 In-circuit constraint count
Inside the Sapling Spend/Output Groth16 circuit, computing a Pedersen hash of bits costs approximately
In sapling_crypto::circuit::pedersen_hash, the per-window
breakdown:
- 1 constraint to range-check the boolean bits.
- 2-3 constraints to compute the signed-window scalar from the bits.
- 5-7 constraints to perform conditional point addition with the per-segment generator (strongly-unified twisted-Edwards formulas).
Net: constraints per bit. For the Sapling note commitment (input bits before randomness), that is constraints. The Merkle-tree hash (input bits) is constraints; over a depth-32 tree, constraints.
This dominates the Spend circuit's budget ( of total). Optimising Pedersen-hash circuits has been a frequent area of contribution.
3.4 The " mixing" Pedersen hash
For the Sapling nullifier, is derived from the commitment and the position via a modified Pedersen hash called :
where is a fixed generator. is itself a Pedersen-hash output (a point), and we add (not concatenate) the position contribution. This keeps a single Jubjub point without re-hashing. The position is bounded by , so is a small-integer scalar mul; in- circuit much cheaper than another full Pedersen hash.
3.5 The NoteCommit formula
The Sapling note commitment is
with a fixed Jubjub generator and the -coordinate as a 255-bit little-endian string. Inputs total bits.
Why -coordinates instead of full point encodings? The -coordinate uniquely identifies a Jubjub point up to sign, and the negation of a valid Sapling diversifier is also a valid encoding. The sign disambiguation does not affect commitment security (we only need the point modulo sign to be unique).
3.6 The MerkleHash formula
Layer-aware Merkle hash:
with the 6-bit layer index, the 255-bit children. The result is the -coordinate of the Pedersen-hash output, an element. Including prevents an attacker who sees from claiming it is also at a different layer; the generators differ per layer.
3.7 Constant-time considerations
Pedersen hash outside the circuit (wallet trial-decryption
pipeline) takes secret input (, ) and must be
constant-time. sapling_crypto::pedersen_hash::pedersen_hash
uses constant-time scalar mul via the jubjub crate; bit
decomposition and chunking work on byte-aligned data; looping is
over public-size inputs. The Merkle-tree hash inside scanning is
not secret-dependent beyond the position of the wallet's note,
which the wallet already knows; no constant-time concern.
3.8 Sapling Pedersen hash vs Sinsemilla
| Aspect | Sapling Pedersen | Orchard Sinsemilla |
|---|---|---|
| Curve | Jubjub | Pallas |
| Window | 3 bits | 10 bits |
| Per-chunk in-circuit cost | constraints | constraints |
| Uses lookups? | No (Groth16 R1CS) | Yes (Halo 2 lookups) |
| Pre-images per segment | entries per fragment |
Sinsemilla is an evolved Pedersen hash that exploits Halo 2's lookup tables, which Groth16 lacks.
3.9 Output type and coordinate extraction
sapling_crypto::pedersen_hash::pedersen_hash returns a
jubjub::SubgroupPoint. Callers that need a scalar (the Merkle
hash) extract the -coordinate via to_affine().get_u(). The
conversion from (the Jubjub base field, equal to
the BLS12-381 scalar field) to a scalar is
direct.
3.10 Test vectors
Sapling has extensive Pedersen-hash test vectors in
sapling-crypto:
sapling-crypto/src/test_vectors/pedersen_hash_vectors.rs: bytes in, bytes out.sapling-crypto/src/test_vectors/note_encryption_vectors.rs: end-to-end with all derived values.
When debugging a Pedersen-hash variation, the test vectors locate corrupt generators or window-encoding bugs in a single mismatched hash.
4. Failure modes
- Reversed window bit ordering. assumes
in little-endian. Reversing endianness
silently produces wrong hashes whose only symptom is a
mismatch against the test vectors.
Caught by: vector-based tests in the external
sapling-cryptocrate (sapling_crypto::pedersen_hashagainstsapling-crypto/src/test_vectors/pedersen_hash_vectors.rs). No automated test in this workspace; the Sapling Pedersen implementation is upstream oflibrustzcash. - Reused personalisation tag. Every distinct use of
Pedersen hash has its own personalisation. Adding a new use
and reusing an existing tag (e.g. recycling the
NoteCommitmenttag for aMerkleTreeuse) lets an adversary produce two pre-images of the same hash from different domains.No automated test in this workspace. Caught by audit only.
- BLAKE2b vs BLAKE2s for generator hashing. The generator
construction uses BLAKE2s with personalisation
"Zcash_PH". Substituting BLAKE2b changes every generator; the bug only manifests when the test vectors mismatch.Caught upstream by the generator-vector tests in
sapling-crypto::constants. No automated test in this workspace. - Cofactor not multiplied in
HashToCurve. Omitting the final step puts the generator outside ; chapter 13 covers the small-subgroup consequences.No automated test in this workspace. Caught by audit only.
- Layer index width. 6 bits is enough for depth-32 trees;
a future protocol with depth-64 trees would need a wider
encoding.
Caught by:
zcash_primitives::merkle_treeround-trip tests inzcash_primitives/src/merkle_tree.rs, which fix tree depth at at the type level. - Missing range constraint on . If the
prover can insert a window value outside
, the segment-scalar bound
argument breaks. Range constraints are enforced via boolean
checks in the circuit; removing them silently weakens
collision resistance.
Caught upstream by
sapling-crypto::circuit::pedersen_hashconstraint-system tests usingbellman'sTestConstraintSystem. No automated test in this workspace. - Non-constant-time scalar mul in
pedersen_hash. Leaks or to a timing observer. Use thejubjub::SubgroupPoint::mulAPI, not a hand-rolled loop.No automated test in this workspace. Caught by audit only.
5. Spec pointers
- Zcash Protocol Specification, section 5.4.1.7 (Pedersen Hash Function): the formal definition of , segments, and the per-domain generator construction.
- Zcash Protocol Specification, section 5.4.1.8 (Mixing Pedersen Hash): the -mixing variant cited in section 3.4.
- Zcash Protocol Specification, section 5.4.9 (Group Hash into Jubjub): the try-and-increment hash-to-curve used for the generators.
- Hopwood et al., Sapling design notes: the rationale for the 3-bit window and the segment size of 63.
- Pedersen, Non-Interactive and Information-Theoretic Secure Verifiable Secret Sharing (CRYPTO 1991): the original Pedersen commitment whose collision-resistance reduction underlies the construction.
6. Exercises
- Verify the encoding by hand. Compute for and . Confirm against the table in section 2. Then compute the segment scalar for , , all other chunks . (Hint: only the first two terms in the sum are non-trivial; the rest contribute .)
- Trace the Sprout vs Sapling commitment. Compare
zcash_proofs/src/circuit/sprout/commitment.rs(SHA-256 with a 4-bit type tag prefix) against the SaplingNoteCommitformula in section 3.5. State which input length each accepts and why the Sapling version saves roughly an order of magnitude in constraints. - Compute the Spend-circuit Pedersen budget. Using the constraints-per-bit estimate from section 3.3, work out the total Pedersen-hash constraint count for a Spend circuit at tree depth 32. Compare to the empirical figure. Identify where the missing constraints come from (range checks on , the spend-auth key derivation, the value commitment).
- Modify and test (
sapling-crypto). In a checkout ofsapling-crypto, write a unit test that hashes the empty bit string under domain and asserts the output matches the Merkle-tree hash of two zero children at layer 0. The test should pass when the layer-index encoding is correct and fail if you flip the layer-index width from 6 to 7 bits.
Answers in the code
- Sprout commitment gadget (SHA-256 + 4-bit type tag):
zcash_proofs/src/circuit/sprout/commitment.rs#L1-L39. - The Pedersen hash itself lives in the external
sapling- cryptocrate; entry points arepedersen_hash::pedersen_hash(out-of-circuit) andcircuit::pedersen_hash::pedersen_hash(gadget). - Test vectors are in
sapling-crypto/src/test_vectors/.
7. Further reading
- Chapter 04: the sketch of Pedersen hash this chapter expands.
- Chapter 17: the Sinsemilla construction that supersedes Pedersen hash on Pallas.
- Chapter 13: the cofactor-clearing step at the end of the generator construction.
- Chapter 24: the clause-by-clause walk of the Spend / Output circuit with exact constraint counts.