Group Hash and Pedersen Hash
1. Why this chapter exists
Sapling's note commitment, value commitment, and Merkle tree all bottom out in
two primitives: a hash-to-curve operation called the group hash
(group_hash in code) and a Pedersen hash built on top of
seven fixed generators. Both are fully defined inside this crate; both have been
the subject of side-channel and collision considerations that you should know
before you touch pedersen_hash.rs, constants.rs, or
circuit/pedersen_hash.rs.
By the end you should know exactly which file holds which generator, what each personalisation tag is for, and why the in-circuit and out-of-circuit Pedersen hashes are deliberately separate.
2. Definitions
Definition 4.1 (Sapling group hash). Given a tag and a personalisation ,
returning a Option<SubgroupPoint>. The function returns None when the
BLAKE2s output does not decode to a valid Jubjub point, or when clearing the
cofactor produces the identity. The 64-byte GH_FIRST_BLOCK
(source)
is a deliberately arbitrary ASCII hex string that domain-separates group-hash
inputs from anything else BLAKE2s might be used for.
Definition 4.2 (Pedersen hash). Given the seven fixed generators
defined in
PEDERSEN_HASH_GENERATORS,
a personalisation prefix (six bits from
Personalization::get_bits), and a bit-string
, the Sapling Pedersen hash is
where , bits per generator
(windows of 3 bits, up to
PEDERSEN_HASH_CHUNKS_PER_GENERATOR
= 63 chunks per segment), and is the signed-window encoding
of segment
(source).
The output lives in .
Definition 4.3 (Personalization). A two-variant enum that selects the 6-bit
prefix prepended to the message:
Personalization::NoteCommitment
yields 0b111111, and Personalization::MerkleTree(d) yields the 6-bit binary
encoding of the depth . The prefix domain-separates
note commitments from Merkle inner nodes and between Merkle levels.
Lemma 4.4 (collision resistance, informal). Finding two distinct inputs with implies finding a non-trivial discrete log relation among . Since the generators are derived from independent group-hash outputs, no such relation is known. Caveat: this is contingent on no linear relation existing between the seven points; see Failure modes.
Invariant 4.5 (no algebraic relation among generators). The seven Pedersen-hash generators are jointly non-degenerate: none is the identity, no two are equal, no one is the inverse of another, and no one is the sum of any two others. These are checked at build/test time, not at runtime:
loading...
3. The code
3.1 The group hash
The hash-to-curve primitive is one function. Read it in full once:
loading...
Three things to note:
- The personalisation must be exactly 8 bytes (line 16).
- The 64-byte
GH_FIRST_BLOCKis constant across all calls (line 25); only the tag varies. - The cofactor is cleared on line 33. This is the step that turns a potentially small-order point into a guaranteed prime-order element (or the identity, which is then rejected on line 35).
3.2 The Pedersen hash, out of circuit
The out-of-circuit Pedersen hash uses a pre-computed exponentiation table
(PEDERSEN_HASH_EXP_TABLE) that converts the segment scalars into points via
window-based table lookups instead of generic double-and-add. The table is built
lazily by
generate_pedersen_hash_exp_table
at first access via lazy_static!.
loading...
Two structural facts to internalize:
- The 3-bit window is signed: the negation step on lines 65-67 handles the sign bit. This matters in the circuit, where the negation has to be expressed as a constraint, not a conditional branch.
- The outer loop iterates over segments of 189 bits each
(
PEDERSEN_HASH_CHUNKS_PER_GENERATOR= 63chunks of 3 bits). An input longer than bits would exhaust the generator pool and panic ongenerators.next().expect(...). The note commitment input is 576 bits (64 value + 256 g_d + 256 pk_d), well under the cap.
3.3 The Pedersen hash, in circuit
The in-circuit gadget is a separate file:
loading...
The two implementations have to agree byte-for-byte on every test vector. The
agreement is tested in
circuit::test_input_circuit_with_bls12_381_external_test_vectors
and in the standalone
pedersen_hash::test::test_pedersen_hash_points,
which loads the 716-line test vector table:
loading...
4. The fixed generators
The seven Pedersen bases and the six application-specific generators are all
fixed constants in
constants.rs.
Each comes paired with a unit test that asserts it equals
of a small seed:
loading...
The tests serve two purposes. First, they prevent a refactor from silently swapping a generator. Second, they document the seed used to derive the generator; the seed is needed to re-derive the generator in another implementation (e.g. a C++ port).
5. Failure modes
- A new generator that is a linear combination of existing ones. This would
break Pedersen-hash collision resistance. The invariant is enforced by the
test
pedersen_hash_generators_consistency. Caught by: that test. - The in-circuit and out-of-circuit Pedersen hashes drift. If a PR modifies
one but not the other, every Spend or Output proof in existence becomes
unverifiable against the new code. Caught by: the test vectors at
pedersen_hash::test::test_pedersen_hash_points(out-of-circuit) and the constraint hash assertion incircuit::test_input_circuit_with_bls12_381(assert_eq!(cs.hash(), "d37c738e83df5d9b0bb6495ac96abf21bcb2697477e2c15c2c7916ff7a3b6a89")). Both will fail loudly. - Personalisation reuse. Two different uses of BLAKE2s with the same
personalisation collide trivially across the two contexts. Caught by: code
review. No automated test in this workspace; the personalisations live in
constants.rsand the contract is "do not collide them" rather than "we test that they do not collide." - An input exceeds 1323 bits. The Pedersen-hash main loop panics with
"we don't have enough generators". The only inputs the protocol passes are bounded by construction, so this is a guarded invariant rather than a runtime check. Caught by: nothing automatic; a contributor who extends the Spend circuit must check the bound by hand.
6. Spec pointers
- Zcash Protocol Specification, §5.4.1.7 (Pedersen hash function)
is the authority. Cross-reference the encoding equation in §5.4.1.7 against
the loop body in
src/pedersen_hash.rs. - Zcash Protocol Specification, §5.4.9.6 (Sapling Group Hash)
defines
GroupHash_URS^Sapling, which is whatgroup_hash::group_hashimplements.
7. Exercises
- Re-derive
NOTE_COMMITMENT_RANDOMNESS_GENERATOR. Run the existing testnote_commitment_randomness_generator. Read itsfind_group_hashhelper. Identify the tag and the personalisation. Implement the same loop in a scratch program and confirm you recover the same point. - Compute one Pedersen hash by hand. Take the input bits
[true, false, false, true](4 bits), the personalisationPersonalization::NoteCommitment, and step through the loop insrc/pedersen_hash.rs. Show that exactly one segment is used (the first generator ), and that the scalar produced is a signed 3-bit number. Verify your computation against the function output. - Add a regression test for a refactor. Modify
pedersen_hash::pedersen_hashto use aforloop instead of theloop { ... break; }pattern, preserving behaviour. Add a new unit test that hashes a fixed input and asserts a fixed expected output (use one of the existing test vectors). Runcargo test --all-features -- pedersen_hash. The existing test_pedersen_hash_points test still has to pass; your new test serves as a tighter regression anchor.
Answers in the code. For exercise 1, the seed for
NOTE_COMMITMENT_RANDOMNESS_GENERATOR is the byte b"r" plus
PEDERSEN_HASH_GENERATORS_PERSONALIZATION
(source).
For exercise 2, the first 4 bits encode prefix 0b111111 (NoteCommitment)
followed by your 4 bits, but the first segment only consumes the first 3 bits of
the combined string and ignores the rest; chase through the code carefully.