Skip to main content

16 - Pedersen hash deep dive

1. Why this chapter exists

Pedersen hash is the single most-evaluated cryptographic operation in Sapling: every note commitment, every Merkle-tree hash, every value commitment runs through it. A contributor who cannot derive the windowed encoding enc3\text{enc}_3, name the per-segment generator construction, and trace the in-circuit constraint count will not be able to review a circuit-level PR, debug a hash mismatch, or understand why zcash_proofs/src/circuit/sprout/commitment.rs uses SHA-256 (Sprout legacy) while Sapling uses Pedersen. The chapter fills in the full construction; chapter 04 only sketched it. The implementation lives in the external sapling-crypto crate (sapling_crypto::pedersen_hash and sapling_crypto::circuit::pedersen_hash).

2. Definitions

Definition 2.1 (The Jubjub setting). Let GJEJubjub(Fr)\mathbb{G}_J \subset E_{\text{Jubjub}}(\mathbb{F}_r) denote the prime-order subgroup of the twisted Edwards curve Jubjub over the BLS12-381 scalar field Fr\mathbb{F}_r. Its order is J\ell_J with J2252\ell_J \approx 2^{252} and the cofactor is hJ=8h_J = 8. Throughout the chapter the Pedersen hash takes a bit string m{0,1}km \in \{0,1\}^k and a domain-separation tag D{0,1}64D \in \{0,1\}^{64} and outputs a point of GJ\mathbb{G}_J. We write Gj(D)GJG_j^{(D)} \in \mathbb{G}_J for the jj-th generator associated with domain DD.

Definition 2.2 (Windowed encoding enc3\mathsf{enc}_3). Given a 3-bit chunk c=(b0,b1,b2){0,1}3c = (b_0, b_1, b_2) \in \{0,1\}^3, define

enc3:{0,1}3{4,3,2,1,1,2,3,4},enc3(c)  =  (1+b0+2b1)(12b2).\mathsf{enc}_3 : \{0,1\}^3 \rightarrow \{-4, -3, -2, -1, 1, 2, 3, 4\}, \qquad \mathsf{enc}_3(c) \;=\; (1 + b_0 + 2 b_1)(1 - 2 b_2).

Explicitly:

c=(b0,b1,b2)c = (b_0, b_1, b_2)enc3(c)\mathsf{enc}_3(c)
(0,0,0)(0, 0, 0)11
(1,0,0)(1, 0, 0)22
(0,1,0)(0, 1, 0)33
(1,1,0)(1, 1, 0)44
(0,0,1)(0, 0, 1)1-1
(1,0,1)(1, 0, 1)2-2
(0,1,1)(0, 1, 1)3-3
(1,1,1)(1, 1, 1)4-4

This is the Booth-encoded representation of a 3-bit window with a sign bit. In-circuit, doubling the partial sum costs the same as adding it; the encoding minimises the number of base doublings needed.

Definition 2.3 (Segment scalar). Fix a segment length of 6363 chunks (3×63=1893 \times 63 = 189 input bits). For chunks c0,,c62{0,1}3c_0, \ldots, c_{62} \in \{0,1\}^3, the segment scalar is the integer

 ⁣c0,,c62 ⁣  =  i=062enc3(ci)24iZ.\langle\!\langle c_0, \ldots, c_{62} \rangle\!\rangle \;=\; \sum_{i=0}^{62} \mathsf{enc}_3(c_i) \cdot 2^{4i} \in \mathbb{Z}.

The spacing 24i2^{4i}, rather than 23i2^{3i}, is forced because each window contributes a value in {4,,1,1,,4}\{-4,\ldots,-1,1,\ldots,4\} which needs four bits to encode unambiguously. The segment scalar lies in the interval

(4152252,  4152252)Z,\Bigl(-\tfrac{4}{15} \cdot 2^{252},\; \tfrac{4}{15} \cdot 2^{252}\Bigr) \subset \mathbb{Z},

which is comfortably contained in the residue system modulo J\ell_J.

Definition 2.4 (Per-segment generator). For domain D{0,1}64D \in \{0,1\}^{64} and segment index jNj \in \mathbb{N}, the generator Gj(D)GJG_j^{(D)} \in \mathbb{G}_J is defined by the try-and-increment hash-to-curve construction

Gj(D)  =  HashToCurveJubjub ⁣(DjLE),G_j^{(D)} \;=\; \mathsf{HashToCurve}_{\text{Jubjub}}\!\bigl(D \,\mathbin{\|}\, j_{\text{LE}}\bigr),

where on each attempt tNt \in \mathbb{N} the candidate

ht  =  BLAKE2s-256 ⁣(“Zcash_PH”,  DjLEtLE)h_t \;=\; \mathsf{BLAKE2s\text{-}256}\!\bigl( \text{``Zcash\_PH''},\; D \,\mathbin{\|}\, j_{\text{LE}} \,\mathbin{\|}\, t_{\text{LE}}\bigr)

is interpreted as a candidate vv-coordinate. If a valid uu exists with (u,ht)(u, h_t) on the curve, the resulting point is multiplied by the cofactor hJ=8h_J = 8 to land in GJ\mathbb{G}_J; otherwise tt is incremented. The construction is deterministic and yields a point of GJ\mathbb{G}_J uniform up to a negligible bias.

Definition 2.5 (Pedersen hash). Let DD be a domain tag and let m{0,1}m \in \{0,1\}^{\ast} be a bit string padded by zeros to a multiple of 189189 bits. Write S(m)=m/189S(m) = \lceil |m| / 189 \rceil and let cj,ic_{j,i} denote the ii-th 3-bit chunk of the jj-th segment. The Pedersen hash is

PHD:{0,1}GJ,PHD(m)  =  j=0S(m)1[ ⁣cj,0,,cj,62 ⁣]Gj(D).\mathsf{PH}_D : \{0,1\}^{\ast} \rightarrow \mathbb{G}_J, \qquad \mathsf{PH}_D(m) \;=\; \sum_{j=0}^{S(m)-1} \bigl[\,\langle\!\langle c_{j,0}, \ldots, c_{j, 62} \rangle\!\rangle\,\bigr] G_j^{(D)}.

Theorem 2.6 (Collision resistance reduces to DLP on GJ\mathbb{G}_J). Let D{0,1}64D \in \{0,1\}^{64} be fixed and let the generators {Gj(D)}j0\{G_j^{(D)}\}_{j \geq 0} be sampled by Definition 2.4. Any algorithm A\mathcal{A} that finds m1,m2{0,1}m_1, m_2 \in \{0,1\}^{\ast} with m1m2m_1 \neq m_2 and PHD(m1)=PHD(m2)\mathsf{PH}_D(m_1) = \mathsf{PH}_D(m_2) can be converted, with the same advantage and a polynomial overhead, into an algorithm that computes a non-trivial discrete-log relation among {Gj(D)}\{G_j^{(D)}\} in GJ\mathbb{G}_J.

Proof sketch. A collision yields integer differences $\Delta_j = \langle!\langle c_{j,\bullet}^{(1)} \rangle!\rangle

  • \langle!\langle c_{j,\bullet}^{(2)} \rangle!\ranglewithwith|\Delta_j| < \ell_J,notallzero,satisfying, not all zero, satisfying \sum_j [\Delta_j] G_j^{(D)} = \mathcal{O}.Thechunkencodingboundonsegmentscalars(Definition2.3)ensures. The chunk-encoding bound on segment scalars (Definition 2.3) ensures |\Delta_j| < \ell_J,sotherelationisnontrivialmodulo, so the relation is non-trivial modulo \ell_J.Solvingdiscretelogon. Solving discrete log on \mathbb{G}_J$ recovers the relation; conversely the relation gives a non-trivial linear dependence among generators whose pairwise discrete logs are secret. See Hopwood et al., Sapling design notes, §5.4.1.7 for the full reduction.

3. The code

3.1 The Sapling Pedersen-hash flow lives in sapling-crypto

The Sapling implementation is in the external sapling-crypto crate. librustzcash consumes it via re-exports:

  • sapling_crypto::pedersen_hash::pedersen_hash: out-of-circuit, used by the wallet for trial decryption and witness construction.
  • sapling_crypto::circuit::pedersen_hash::pedersen_hash: the Bellman gadget used inside the Spend / Output Groth16 circuit.
  • sapling_crypto::constants: precomputed generator sets per domain.
  • sapling_crypto::primitives::NoteCommitment: high-level note commitment using Pedersen hash plus commitment randomness.
  • sapling_crypto::primitives::MerkleHash: the Merkle-tree hash.

3.2 Comparison: Sprout commits with SHA-256, not Pedersen

For context, here is the Sprout note_comm gadget that lives in this repo. It uses SHA-256 with a 4-bit type tag prefix (0xb00\text{xb0} = 1011_0000), not Pedersen hash:

zcash_proofs/src/circuit/sprout/commitment.rs
loading...

The Sapling improvement is replacing this with the algebraic Pedersen hash, which reduces the in-circuit constraint count by about an order of magnitude for the same input length.

3.3 In-circuit constraint count

Inside the Sapling Spend/Output Groth16 circuit, computing a Pedersen hash of kk bits costs approximately

constraints    k3(constraints per window).\text{constraints} \;\approx\; \tfrac{k}{3} \cdot (\text{constraints per window}).

In sapling_crypto::circuit::pedersen_hash, the per-window breakdown:

  • 1 constraint to range-check the boolean bits.
  • 2-3 constraints to compute the signed-window scalar from the bits.
  • 5-7 constraints to perform conditional point addition with the per-segment generator (strongly-unified twisted-Edwards formulas).

Net: 6\sim 6 constraints per bit. For the Sapling note commitment (input 8+88+256+256=608\approx 8 + 88 + 256 + 256 = 608 bits before randomness), that is 3600\sim 3600 constraints. The Merkle-tree hash (input 512\approx 512 bits) is 3000\sim 3000 constraints; over a depth-32 tree, 100,000\sim 100{,}000 constraints.

This dominates the Spend circuit's budget (100,000\sim 100{,}000 of 150,000\sim 150{,}000 total). Optimising Pedersen-hash circuits has been a frequent area of contribution.

3.4 The "ρ\rho mixing" Pedersen hash

For the Sapling nullifier, ρ\rho is derived from the commitment and the position via a modified Pedersen hash called MixingPedersenHash\mathsf{MixingPedersenHash}:

ρ  =  MixingPedersenHash(cm,pos)  =  cm  +  [pos]Gρ,\rho \;=\; \mathsf{MixingPedersenHash}(\mathsf{cm}, \mathsf{pos}) \;=\; \mathsf{cm} \;+\; [\mathsf{pos}] G_{\rho},

where GρG_{\rho} is a fixed generator. cm\mathsf{cm} is itself a Pedersen-hash output (a point), and we add (not concatenate) the position contribution. This keeps ρ\rho a single Jubjub point without re-hashing. The position is bounded by 2322^{32}, so [pos]Gρ[\mathsf{pos}] G_{\rho} is a small-integer scalar mul; in- circuit much cheaper than another full Pedersen hash.

3.5 The NoteCommit formula

The Sapling note commitment is

NoteCommit(rcm,v,gd,pkd)=PHDnc ⁣(vLE,64u(gd)LE,255u(pkd)LE,255)  +  [rcm]Rnc,\mathsf{NoteCommit}(\mathsf{rcm}, v, g_d, \mathsf{pk}_d) = \mathsf{PH}_{D_{\text{nc}}}\!\bigl(\, v_{\text{LE},64} \,\|\, u(g_d)_{\text{LE},255} \,\|\, u(\mathsf{pk}_d)_{\text{LE},255} \,\bigr) \;+\; [\mathsf{rcm}] R_{\text{nc}},

with RncR_{\text{nc}} a fixed Jubjub generator and u()u(\cdot) the uu-coordinate as a 255-bit little-endian string. Inputs total 64+255+255=57464 + 255 + 255 = 574 bits.

Why uu-coordinates instead of full point encodings? The uu-coordinate uniquely identifies a Jubjub point up to sign, and the negation of a valid Sapling diversifier is also a valid encoding. The sign disambiguation does not affect commitment security (we only need the point modulo sign to be unique).

3.6 The MerkleHash formula

Layer-aware Merkle hash:

MerkleHash(x,y)=ExtractJubjub ⁣(PHDMH ⁣(LE,6xLE,255yLE,255)),\mathsf{MerkleHash}_\ell(x, y) = \mathsf{ExtractJubjub}\!\Bigl( \mathsf{PH}_{D_{\text{MH}}}\!\bigl(\, \ell_{\text{LE},6} \,\|\, x_{\text{LE},255} \,\|\, y_{\text{LE},255} \,\bigr) \Bigr),

with \ell the 6-bit layer index, x,yx, y the 255-bit children. The result is the uu-coordinate of the Pedersen-hash output, an Fr\mathbb{F}_r element. Including \ell prevents an attacker who sees MerkleHash(a,b)\mathsf{MerkleHash}_\ell(a, b) from claiming it is also MerkleHash(a,b)\mathsf{MerkleHash}_{\ell'}(a, b) at a different layer; the generators differ per layer.

3.7 Constant-time considerations

Pedersen hash outside the circuit (wallet trial-decryption pipeline) takes secret input (rcm\mathsf{rcm}, vv) and must be constant-time. sapling_crypto::pedersen_hash::pedersen_hash uses constant-time scalar mul via the jubjub crate; bit decomposition and chunking work on byte-aligned data; looping is over public-size inputs. The Merkle-tree hash inside scanning is not secret-dependent beyond the position of the wallet's note, which the wallet already knows; no constant-time concern.

3.8 Sapling Pedersen hash vs Sinsemilla

AspectSapling PedersenOrchard Sinsemilla
CurveJubjubPallas
Window3 bits10 bits
Per-chunk in-circuit cost6\sim 6 constraints3\sim 3 constraints
Uses lookups?No (Groth16 R1CS)Yes (Halo 2 lookups)
Pre-images per segment2189\sim 2^{189}210\sim 2^{10} entries per fragment

Sinsemilla is an evolved Pedersen hash that exploits Halo 2's lookup tables, which Groth16 lacks.

3.9 Output type and coordinate extraction

sapling_crypto::pedersen_hash::pedersen_hash returns a jubjub::SubgroupPoint. Callers that need a scalar (the Merkle hash) extract the uu-coordinate via to_affine().get_u(). The conversion from Fr\mathbb{F}_r (the Jubjub base field, equal to the BLS12-381 scalar field) to a Fr\mathbb{F}_r scalar is direct.

3.10 Test vectors

Sapling has extensive Pedersen-hash test vectors in sapling-crypto:

  • sapling-crypto/src/test_vectors/pedersen_hash_vectors.rs: bytes in, bytes out.
  • sapling-crypto/src/test_vectors/note_encryption_vectors.rs: end-to-end with all derived values.

When debugging a Pedersen-hash variation, the test vectors locate corrupt generators or window-encoding bugs in a single mismatched hash.

4. Failure modes

  • Reversed window bit ordering. enc3\mathsf{enc}_3 assumes (b0,b1,b2)(b_0, b_1, b_2) in little-endian. Reversing endianness silently produces wrong hashes whose only symptom is a mismatch against the test vectors.

    Caught by: vector-based tests in the external sapling-crypto crate (sapling_crypto::pedersen_hash against sapling-crypto/src/test_vectors/pedersen_hash_vectors.rs). No automated test in this workspace; the Sapling Pedersen implementation is upstream of librustzcash.

  • Reused personalisation tag. Every distinct use of Pedersen hash has its own personalisation. Adding a new use and reusing an existing tag (e.g. recycling the NoteCommitment tag for a MerkleTree use) lets an adversary produce two pre-images of the same hash from different domains.

    No automated test in this workspace. Caught by audit only.

  • BLAKE2b vs BLAKE2s for generator hashing. The generator construction uses BLAKE2s with personalisation "Zcash_PH". Substituting BLAKE2b changes every generator; the bug only manifests when the test vectors mismatch.

    Caught upstream by the generator-vector tests in sapling-crypto::constants. No automated test in this workspace.

  • Cofactor not multiplied in HashToCurve. Omitting the final [hJ][h_J] \cdot step puts the generator outside GJ\mathbb{G}_J; chapter 13 covers the small-subgroup consequences.

    No automated test in this workspace. Caught by audit only.

  • Layer index width. 6 bits is enough for depth-32 trees; a future protocol with depth-64 trees would need a wider encoding.

    Caught by: zcash_primitives::merkle_tree round-trip tests in zcash_primitives/src/merkle_tree.rs, which fix tree depth at 3232 at the type level.

  • Missing range constraint on enc3\mathsf{enc}_3. If the prover can insert a window value outside {4,,1,1,,4}\{-4, \ldots, -1, 1, \ldots, 4\}, the segment-scalar bound argument breaks. Range constraints are enforced via boolean checks in the circuit; removing them silently weakens collision resistance.

    Caught upstream by sapling-crypto::circuit::pedersen_hash constraint-system tests using bellman's TestConstraintSystem. No automated test in this workspace.

  • Non-constant-time scalar mul in pedersen_hash. Leaks rcm\mathsf{rcm} or vv to a timing observer. Use the jubjub::SubgroupPoint::mul API, not a hand-rolled loop.

    No automated test in this workspace. Caught by audit only.

5. Spec pointers

6. Exercises

  1. Verify the encoding by hand. Compute enc3(c)\text{enc}_3(c) for c=(1,1,1)c = (1, 1, 1) and c=(0,1,0)c = (0, 1, 0). Confirm against the table in section 2. Then compute the segment scalar for c0=(1,0,0)c_0 = (1, 0, 0), c1=(0,1,0)c_1 = (0, 1, 0), all other chunks (0,0,0)(0,0,0). (Hint: only the first two terms in the sum are non-trivial; the rest contribute enc3(0,0,0)=1\text{enc}_3(0,0,0) = 1.)
  2. Trace the Sprout vs Sapling commitment. Compare zcash_proofs/src/circuit/sprout/commitment.rs (SHA-256 with a 4-bit type tag prefix) against the Sapling NoteCommit formula in section 3.5. State which input length each accepts and why the Sapling version saves roughly an order of magnitude in constraints.
  3. Compute the Spend-circuit Pedersen budget. Using the 6\sim 6 constraints-per-bit estimate from section 3.3, work out the total Pedersen-hash constraint count for a Spend circuit at tree depth 32. Compare to the empirical 100,000\sim 100{,}000 figure. Identify where the missing constraints come from (range checks on vv, the spend-auth key derivation, the value commitment).
  4. Modify and test (sapling-crypto). In a checkout of sapling-crypto, write a unit test that hashes the empty bit string under domain DMHD_{\text{MH}} and asserts the output matches the Merkle-tree hash of two zero children at layer 0. The test should pass when the layer-index encoding is correct and fail if you flip the layer-index width from 6 to 7 bits.

Answers in the code

  • Sprout commitment gadget (SHA-256 + 4-bit type tag): zcash_proofs/src/circuit/sprout/commitment.rs#L1-L39.
  • The Pedersen hash itself lives in the external sapling- crypto crate; entry points are pedersen_hash::pedersen_hash (out-of-circuit) and circuit::pedersen_hash::pedersen_hash (gadget).
  • Test vectors are in sapling-crypto/src/test_vectors/.

7. Further reading

  • Chapter 04: the sketch of Pedersen hash this chapter expands.
  • Chapter 17: the Sinsemilla construction that supersedes Pedersen hash on Pallas.
  • Chapter 13: the cofactor-clearing step at the end of the generator construction.
  • Chapter 24: the clause-by-clause walk of the Spend / Output circuit with exact constraint counts.