Group Hash and Pedersen Hash

1. Why this chapter exists

Sapling's note commitment, value commitment, and Merkle tree all bottom out in two primitives: a hash-to-curve operation called the group hash (group_hash in code) and a Pedersen hash built on top of seven fixed generators. Both are fully defined inside this crate; both have been the subject of side-channel and collision considerations that you should know before you touch pedersen_hash.rs, constants.rs, or circuit/pedersen_hash.rs.

By the end you should know exactly which file holds which generator, what each personalisation tag is for, and why the in-circuit and out-of-circuit Pedersen hashes are deliberately separate.

2. Definitions

Definition 4.1 (Sapling group hash). Given a tag $t \in \{0,1\}^*$ and a personalisation $p \in \{0,1\}^{64}$ ,

\mathsf{GroupHash}_p(t) = \mathsf{clear\_cofactor}\big( \mathsf{abst}_{\mathbb{J}}( \mathsf{BLAKE2s}_{32, p}( \mathsf{GH\_FIRST\_BLOCK} \mathbin{\|} t ) ) \big),

returning a Option<SubgroupPoint>. The function returns None when the BLAKE2s output does not decode to a valid Jubjub point, or when clearing the cofactor produces the identity. The 64-byte GH_FIRST_BLOCK (source) is a deliberately arbitrary ASCII hex string that domain-separates group-hash inputs from anything else BLAKE2s might be used for.

Definition 4.2 (Pedersen hash). Given the seven fixed generators $G_0, \ldots, G_6 \in \mathbb{J}^{(r)}$ defined in PEDERSEN_HASH_GENERATORS, a personalisation prefix $p \in \{0,1\}^6$ (six bits from Personalization::get_bits), and a bit-string $m \in \{0,1\}^*$ , the Sapling Pedersen hash is

\mathsf{PedersenHash}(p, m) = \sum_{i=0}^{\lceil L / S \rceil - 1} \big[\mathrm{enc}_i(p, m)\big] G_i,

where $L = |p \mathbin{\|} m|$ , $S = 3 \cdot 63 = 189$ bits per generator (windows of 3 bits, up to PEDERSEN_HASH_CHUNKS_PER_GENERATOR = 63 chunks per segment), and $\mathsf{enc}_i$ is the signed-window encoding of segment $i$ (source). The output lives in $\mathbb{J}^{(r)}$ .

Definition 4.3 (Personalization). A two-variant enum that selects the 6-bit prefix prepended to the message: Personalization::NoteCommitment yields 0b111111, and Personalization::MerkleTree(d) yields the 6-bit binary encoding of the depth $d \in \{0, \ldots, 62\}$ . The prefix domain-separates note commitments from Merkle inner nodes and between Merkle levels.

Lemma 4.4 (collision resistance, informal). Finding two distinct inputs $m \neq m'$ with $\mathsf{PedersenHash}(p, m) = \mathsf{PedersenHash}(p, m')$ implies finding a non-trivial discrete log relation among $G_0, \ldots, G_6$ . Since the generators are derived from independent group-hash outputs, no such relation is known. Caveat: this is contingent on no linear relation existing between the seven points; see Failure modes.

Invariant 4.5 (no algebraic relation among generators). The seven Pedersen-hash generators are jointly non-degenerate: none is the identity, no two are equal, no one is the inverse of another, and no one is the sum of any two others. These are checked at build/test time, not at runtime:

src/constants.rs (generator consistency tests)
loading...

View on GitHub

3. The code

3.1 The group hash

The hash-to-curve primitive is one function. Read it in full once:

src/group_hash.rs
loading...

View on GitHub

Three things to note:

The personalisation must be exactly 8 bytes (line 16).
The 64-byte GH_FIRST_BLOCK is constant across all calls (line 25); only the tag varies.
The cofactor is cleared on line 33. This is the step that turns a potentially small-order point into a guaranteed prime-order element (or the identity, which is then rejected on line 35).

3.2 The Pedersen hash, out of circuit

The out-of-circuit Pedersen hash uses a pre-computed exponentiation table (PEDERSEN_HASH_EXP_TABLE) that converts the segment scalars into points via window-based table lookups instead of generic double-and-add. The table is built lazily by generate_pedersen_hash_exp_table at first access via lazy_static!.

src/pedersen_hash.rs
loading...

View on GitHub

Two structural facts to internalize:

The 3-bit window is signed: the negation step on lines 65-67 handles the sign bit. This matters in the circuit, where the negation has to be expressed as a constraint, not a conditional branch.
The outer loop iterates over segments of 189 bits each (PEDERSEN_HASH_CHUNKS_PER_GENERATOR = 63 chunks of 3 bits). An input longer than $7 \times 189 = 1323$ bits would exhaust the generator pool and panic on generators.next().expect(...). The note commitment input is 576 bits (64 value + 256 g_d + 256 pk_d), well under the cap.

3.3 The Pedersen hash, in circuit

The in-circuit gadget is a separate file:

src/circuit/pedersen_hash.rs (header)
loading...

View on GitHub

The two implementations have to agree byte-for-byte on every test vector. The agreement is tested in circuit::test_input_circuit_with_bls12_381_external_test_vectors and in the standalone pedersen_hash::test::test_pedersen_hash_points, which loads the 716-line test vector table:

src/pedersen_hash/test_vectors.rs (head)
loading...

View on GitHub

4. The fixed generators

The seven Pedersen bases and the six application-specific generators are all fixed constants in constants.rs. Each comes paired with a unit test that asserts it equals $\mathsf{GroupHash}$ of a small seed:

src/constants.rs (generator tests)
loading...

View on GitHub

The tests serve two purposes. First, they prevent a refactor from silently swapping a generator. Second, they document the seed used to derive the generator; the seed is needed to re-derive the generator in another implementation (e.g. a C++ port).

5. Failure modes

A new generator that is a linear combination of existing ones. This would break Pedersen-hash collision resistance. The invariant is enforced by the test pedersen_hash_generators_consistency. Caught by: that test.
The in-circuit and out-of-circuit Pedersen hashes drift. If a PR modifies one but not the other, every Spend or Output proof in existence becomes unverifiable against the new code. Caught by: the test vectors at pedersen_hash::test::test_pedersen_hash_points (out-of-circuit) and the constraint hash assertion in circuit::test_input_circuit_with_bls12_381 (assert_eq!(cs.hash(), "d37c738e83df5d9b0bb6495ac96abf21bcb2697477e2c15c2c7916ff7a3b6a89")). Both will fail loudly.
Personalisation reuse. Two different uses of BLAKE2s with the same personalisation collide trivially across the two contexts. Caught by: code review. No automated test in this workspace; the personalisations live in constants.rs and the contract is "do not collide them" rather than "we test that they do not collide."
An input exceeds 1323 bits. The Pedersen-hash main loop panics with "we don't have enough generators". The only inputs the protocol passes are bounded by construction, so this is a guarded invariant rather than a runtime check. Caught by: nothing automatic; a contributor who extends the Spend circuit must check the bound by hand.

6. Spec pointers

Zcash Protocol Specification, §5.4.1.7 (Pedersen hash function) is the authority. Cross-reference the encoding equation in §5.4.1.7 against the loop body in src/pedersen_hash.rs.
Zcash Protocol Specification, §5.4.9.6 (Sapling Group Hash) defines GroupHash_URS^Sapling, which is what group_hash::group_hash implements.

7. Exercises

Re-derive NOTE_COMMITMENT_RANDOMNESS_GENERATOR. Run the existing test note_commitment_randomness_generator. Read its find_group_hash helper. Identify the tag and the personalisation. Implement the same loop in a scratch program and confirm you recover the same point.
Compute one Pedersen hash by hand. Take the input bits [true, false, false, true] (4 bits), the personalisation Personalization::NoteCommitment, and step through the loop in src/pedersen_hash.rs. Show that exactly one segment is used (the first generator $G_0$ ), and that the scalar produced is a signed 3-bit number. Verify your computation against the function output.
Add a regression test for a refactor. Modify pedersen_hash::pedersen_hash to use a for loop instead of the loop { ... break; } pattern, preserving behaviour. Add a new unit test that hashes a fixed input and asserts a fixed expected output (use one of the existing test vectors). Run cargo test --all-features -- pedersen_hash. The existing test_pedersen_hash_points test still has to pass; your new test serves as a tighter regression anchor.

Answers in the code. For exercise 1, the seed for NOTE_COMMITMENT_RANDOMNESS_GENERATOR is the byte b"r" plus PEDERSEN_HASH_GENERATORS_PERSONALIZATION (source). For exercise 2, the first 4 bits encode prefix 0b111111 (NoteCommitment) followed by your 4 bits, but the first segment only consumes the first 3 bits of the combined string and ignores the rest; chase through the code carefully.

1. Why this chapter exists​

2. Definitions​

3. The code​

3.1 The group hash​

3.2 The Pedersen hash, out of circuit​

3.3 The Pedersen hash, in circuit​

4. The fixed generators​

5. Failure modes​

6. Spec pointers​

7. Exercises​

1. Why this chapter exists

2. Definitions

3. The code

3.1 The group hash

3.2 The Pedersen hash, out of circuit

3.3 The Pedersen hash, in circuit

4. The fixed generators

5. Failure modes

6. Spec pointers

7. Exercises