16 - Pedersen hash deep dive
1. Why this chapter exists
Pedersen hash is the single most-evaluated cryptographic operation
in Sapling: every note commitment, every Merkle-tree hash, every
value commitment runs through it. A contributor who cannot derive
the windowed encoding , name the per-segment
generator construction, and trace the in-circuit constraint count
will not be able to review a circuit-level PR, debug a hash
mismatch, or understand why
zcash_proofs/src/circuit/sprout/commitment.rs
uses SHA-256 (Sprout legacy) while Sapling uses Pedersen. The
chapter fills in the full construction; chapter 04 only sketched
it. The implementation lives in the external sapling-crypto
crate (sapling_crypto::pedersen_hash and
sapling_crypto::circuit::pedersen_hash).
2. Definitions
The setting
We work in the prime-order subgroup of order . A Pedersen hash takes a bit string and a domain-separation tag and outputs a curve point. We write for the -th generator associated with domain .
Definition (, the windowed encoding)
Group into 3-bit chunks Each chunk is mapped to a signed integer
Explicitly:
This is the booth-encoded representation of a 3-bit window with a sign bit. In-circuit, doubling the partial sum costs the same as adding it; the encoding minimises the number of base doublings needed.
Definition (segment)
Combine 63 consecutive chunks into one segment of bits. The segment scalar is
The spacing (not ) is chosen because each window contributes a value in which needs 4 bits to encode unambiguously. A segment scalar lies in approximately
comfortably less than .
Definition (per-segment generator)
Each segment has its own generator derived deterministically from and :
The construction is try-and-increment:
- Compute .
- Interpret as a candidate -coordinate. Recover from the curve equation; if no valid exists, increment the counter and retry.
- Multiply by the cofactor to land in .
Deterministic, so anyone can recompute the generator set; uniform in modulo a negligible bias. The discrete log of any relative to any other is unknown modulo DLP on Jubjub.
Definition (Pedersen hash)
Let be the number of segments after padding to a multiple of 189 bits. Then
Invariant (collision resistance reduces to DLP)
Finding with is at least as hard as solving DLP on Jubjub. A collision gives for a non-zero integer vector , which is a non-trivial linear relation among generators whose pairwise discrete logs are unknown.
3. The code
3.1 The Sapling Pedersen-hash flow lives in sapling-crypto
The Sapling implementation is in the external sapling-crypto
crate. librustzcash consumes it via re-exports:
sapling_crypto::pedersen_hash::pedersen_hash: out-of-circuit, used by the wallet for trial decryption and witness construction.sapling_crypto::circuit::pedersen_hash::pedersen_hash: the Bellman gadget used inside the Spend / Output Groth16 circuit.sapling_crypto::constants: precomputed generator sets per domain.sapling_crypto::primitives::NoteCommitment: high-level note commitment using Pedersen hash plus commitment randomness.sapling_crypto::primitives::MerkleHash: the Merkle-tree hash.
3.2 Comparison: Sprout commits with SHA-256, not Pedersen
For context, here is the Sprout note_comm gadget that lives in
this repo. It uses SHA-256 with a 4-bit type tag prefix
( = 1011_0000), not Pedersen hash:
loading...
The Sapling improvement is replacing this with the algebraic Pedersen hash, which reduces the in-circuit constraint count by about an order of magnitude for the same input length.
3.3 In-circuit constraint count
Inside the Sapling Spend/Output Groth16 circuit, computing a Pedersen hash of bits costs approximately
In sapling_crypto::circuit::pedersen_hash, the per-window
breakdown:
- 1 constraint to range-check the boolean bits.
- 2-3 constraints to compute the signed-window scalar from the bits.
- 5-7 constraints to perform conditional point addition with the per-segment generator (strongly-unified twisted-Edwards formulas).
Net: constraints per bit. For the Sapling note commitment (input bits before randomness), that is constraints. The Merkle-tree hash (input bits) is constraints; over a depth-32 tree, constraints.
This dominates the Spend circuit's budget ( of total). Optimising Pedersen-hash circuits has been a frequent area of contribution.
3.4 The " mixing" Pedersen hash
For the Sapling nullifier, is derived from the commitment and the position via a modified Pedersen hash called :
where is a fixed generator. is itself a Pedersen-hash output (a point), and we add (not concatenate) the position contribution. This keeps a single Jubjub point without re-hashing. The position is bounded by , so is a small-integer scalar mul; in- circuit much cheaper than another full Pedersen hash.
3.5 The NoteCommit formula
The Sapling note commitment is
with a fixed Jubjub generator and the -coordinate as a 255-bit little-endian string. Inputs total bits.
Why -coordinates instead of full point encodings? The -coordinate uniquely identifies a Jubjub point up to sign, and the negation of a valid Sapling diversifier is also a valid encoding. The sign disambiguation does not affect commitment security (we only need the point modulo sign to be unique).
3.6 The MerkleHash formula
Layer-aware Merkle hash:
with the 6-bit layer index, the 255-bit children. The result is the -coordinate of the Pedersen-hash output, an element. Including prevents an attacker who sees from claiming it is also at a different layer; the generators differ per layer.
3.7 Constant-time considerations
Pedersen hash outside the circuit (wallet trial-decryption
pipeline) takes secret input (, ) and must be
constant-time. sapling_crypto::pedersen_hash::pedersen_hash
uses constant-time scalar mul via the jubjub crate; bit
decomposition and chunking work on byte-aligned data; looping is
over public-size inputs. The Merkle-tree hash inside scanning is
not secret-dependent beyond the position of the wallet's note,
which the wallet already knows; no constant-time concern.
3.8 Sapling Pedersen hash vs Sinsemilla
| Aspect | Sapling Pedersen | Orchard Sinsemilla |
|---|---|---|
| Curve | Jubjub | Pallas |
| Window | 3 bits | 10 bits |
| Per-chunk in-circuit cost | constraints | constraints |
| Uses lookups? | No (Groth16 R1CS) | Yes (Halo 2 lookups) |
| Pre-images per segment | entries per fragment |
Sinsemilla is an evolved Pedersen hash that exploits Halo 2's lookup tables, which Groth16 lacks.
3.9 Output type and coordinate extraction
sapling_crypto::pedersen_hash::pedersen_hash returns a
jubjub::SubgroupPoint. Callers that need a scalar (the Merkle
hash) extract the -coordinate via to_affine().get_u(). The
conversion from (the Jubjub base field, equal to
the BLS12-381 scalar field) to a scalar is
direct.
3.10 Test vectors
Sapling has extensive Pedersen-hash test vectors in
sapling-crypto:
sapling-crypto/src/test_vectors/pedersen_hash_vectors.rs: bytes in, bytes out.sapling-crypto/src/test_vectors/note_encryption_vectors.rs: end-to-end with all derived values.
When debugging a Pedersen-hash variation, the test vectors locate corrupt generators or window-encoding bugs in a single mismatched hash.
4. Failure modes
- Reversed window bit ordering. assumes in little-endian. Reversing endianness silently produces wrong hashes whose only symptom is a mismatch against the test vectors.
- Reused personalisation tag. Every distinct use of
Pedersen hash has its own personalisation. Adding a new use
and reusing an existing tag (e.g. recycling the
NoteCommitmenttag for aMerkleTreeuse) lets an adversary produce two pre-images of the same hash from different domains. - BLAKE2b vs BLAKE2s for generator hashing. The generator
construction uses BLAKE2s with personalisation
"Zcash_PH". Substituting BLAKE2b changes every generator; the bug only manifests when the test vectors mismatch. - Cofactor not multiplied in
HashToCurve. Omitting the final step puts the generator outside ; chapter 13 covers the small-subgroup consequences. - Layer index width. 6 bits is enough for depth-32 trees; a future protocol with depth-64 trees would need a wider encoding.
- Missing range constraint on
enc_3. If the prover can insert a window value outside , the segment-scalar bound argument breaks. Range constraints are enforced via boolean checks in the circuit; removing them silently weakens collision resistance. - Non-constant-time scalar mul in
pedersen_hash. Leaks or to a timing observer. Use thejubjub::SubgroupPoint::mulAPI, not a hand-rolled loop.
5. Spec pointers
- Zcash Protocol Specification, section 5.4.1.7 (Pedersen Hash Function): the formal definition of , segments, and the per-domain generator construction.
- Zcash Protocol Specification, section 5.4.1.8 (Mixing Pedersen Hash): the -mixing variant cited in section 3.4.
- Zcash Protocol Specification, section 5.4.9 (Group Hash into Jubjub): the try-and-increment hash-to-curve used for the generators.
- Hopwood et al., Sapling design notes: the rationale for the 3-bit window and the segment size of 63.
- Pedersen, Non-Interactive and Information-Theoretic Secure Verifiable Secret Sharing (CRYPTO 1991): the original Pedersen commitment whose collision-resistance reduction underlies the construction.
6. Exercises
- Verify the encoding by hand. Compute for and . Confirm against the table in section 2. Then compute the segment scalar for , , all other chunks . (Hint: only the first two terms in the sum are non-trivial; the rest contribute .)
- Trace the Sprout vs Sapling commitment. Compare
zcash_proofs/src/circuit/sprout/commitment.rs(SHA-256 with a 4-bit type tag prefix) against the SaplingNoteCommitformula in section 3.5. State which input length each accepts and why the Sapling version saves roughly an order of magnitude in constraints. - Compute the Spend-circuit Pedersen budget. Using the constraints-per-bit estimate from section 3.3, work out the total Pedersen-hash constraint count for a Spend circuit at tree depth 32. Compare to the empirical figure. Identify where the missing constraints come from (range checks on , the spend-auth key derivation, the value commitment).
- Modify and test (
sapling-crypto). In a checkout ofsapling-crypto, write a unit test that hashes the empty bit string under domain and asserts the output matches the Merkle-tree hash of two zero children at layer 0. The test should pass when the layer-index encoding is correct and fail if you flip the layer-index width from 6 to 7 bits.
Answers in the code
- Sprout commitment gadget (SHA-256 + 4-bit type tag):
zcash_proofs/src/circuit/sprout/commitment.rs#L1-L39. - The Pedersen hash itself lives in the external
sapling- cryptocrate; entry points arepedersen_hash::pedersen_hash(out-of-circuit) andcircuit::pedersen_hash::pedersen_hash(gadget). - Test vectors are in
sapling-crypto/src/test_vectors/.
7. Further reading
- Chapter 04: the sketch of Pedersen hash this chapter expands.
- Chapter 17: the Sinsemilla construction that supersedes Pedersen hash on Pallas.
- Chapter 13: the cofactor-clearing step at the end of the generator construction.
- Chapter 24: the clause-by-clause walk of the Spend / Output circuit with exact constraint counts.