Skip to main content

Field Arithmetic, FFT, and MSM

1. Why this chapter exists

Everything halo2 does, from polynomial commitment to verifier challenges, eventually bottoms out in one of three primitives:

  • field arithmetic on Fp\mathbb{F}_p (the Pallas / Vesta scalar field),
  • the radix-2 number-theoretic transform (NTT) that we will simply call FFT,
  • multi-scalar multiplication on the corresponding curve, ikiGi\sum_i k_i G_i.

If you change any of them and the constants do not match the formal spec, every test in the suite passes and the verifier still accepts forged proofs. That is the failure mode the maintainers worry about, and it is the failure mode CI cannot catch on its own. This chapter tells you which file is the law for each primitive.

2. Definitions

Definition (Pasta fields). Pallas and Vesta are a 2-cycle of prime-order curves defined in pasta_curves. Let pp and qq be the two primes, with p2255p \approx 2^{255} and q2255q \approx 2^{255}. The Pallas base field Fp\mathbb{F}_p is the Vesta scalar field, and vice versa. This 2-cycle is what makes recursive proof composition practical. Both primes have F×|\mathbb{F}^{\times}| divisible by 2322^{32}, so F\mathbb{F} contains primitive 2322^{32}-th roots of unity; the FFT supports any n=2kn = 2^k for k32k \leq 32.

Definition (Multiplicative subgroup of order nn). Let n=2kn = 2^k and let ωFp×\omega \in \mathbb{F}_p^\times be a primitive nn-th root of unity. The set H={1,ω,ω2,,ωn1}H = \{1, \omega, \omega^2, \dots, \omega^{n-1}\} is the multiplicative subgroup of order nn. The PLONKish circuit lives on HH: row ii corresponds to the point ωi\omega^i.

Definition (FFT). Given coefficients (a0,,an1)(a_0, \dots, a_{n-1}) of a polynomial a(X)=iaiXia(X) = \sum_i a_i X^i, the FFT computes the evaluations (a(ω0),,a(ωn1))(a(\omega^0), \dots, a(\omega^{n-1})) in O(nlogn)O(n \log n) field operations. The radix-2 Cooley-Tukey FFT used here recurses on n/2n / 2 and combines with the butterfly (a(ωi),a(ωi+n/2))=(e(ω2i)+ωio(ω2i),e(ω2i)ωio(ω2i))\bigl(a(\omega^i), a(\omega^{i+n/2})\bigr) = \bigl(e(\omega^{2i}) + \omega^i o(\omega^{2i}), e(\omega^{2i}) - \omega^i o(\omega^{2i})\bigr) where ee and oo are the even-index and odd-index polynomials.

Definition (MSM, Pippenger's algorithm). Given scalars k1,,kmFpk_1, \dots, k_m \in \mathbb{F}_p and points G1,,GmGG_1, \dots, G_m \in \mathbb{G}, the multi-scalar multiplication problem is to compute i[ki]Gi\sum_i [k_i] G_i. Pippenger's algorithm windows the scalars into ww-bit chunks, accumulates each chunk's contribution into 2w2^w "buckets", reduces the buckets, and combines across windows. The cost is roughly m/logmm / \log m point operations plus O(λ)O(\lambda) for finalization, where λ\lambda is the bit length of the scalars.

3. The code

3.1 The module surface

halo2_proofs/src/arithmetic.rs
//! This module provides common utilities, traits and structures for group,
//! field and polynomial arithmetic.

pub use ff::Field;
use group::{
ff::{BatchInvert, PrimeField},
Group as _, GroupOpsOwned, ScalarMulOwned,
};
use maybe_rayon::prelude::*;
pub use pasta_curves::arithmetic::*;

use crate::multicore::{self, TheBestReduce};

/// This represents an element of a group with basic operations that can be
/// performed. This allows an FFT implementation (for example) to operate
/// generically over either a field or elliptic curve group.
pub trait FftGroup<Scalar: Field>:
Copy + Send + Sync + 'static + GroupOpsOwned + ScalarMulOwned<Scalar>
{
}

impl<T, Scalar> FftGroup<Scalar> for T
where
Scalar: Field,
T: Copy + Send + Sync + 'static + GroupOpsOwned + ScalarMulOwned<Scalar>,
{
}

pub use pasta_curves::arithmetic::*; re-exports the curve traits (CurveAffine, CurveExt, Group, Coordinates, etc.) so the rest of the crate can refer to them without naming pasta_curves directly.

3.2 Multi-scalar multiplication

The fast path uses Pippenger:

halo2_proofs/src/arithmetic.rs
///
/// This will use multithreading if beneficial.
pub fn best_multiexp<C: CurveAffine>(coeffs: &[C::Scalar], bases: &[C]) -> C::Curve {
assert_eq!(coeffs.len(), bases.len());

let c = if bases.len() < 4 {
1
} else if bases.len() < 32 {
3
} else {
(f64::from(bases.len() as u32)).ln().ceil() as usize
};

let mut multi_buckets: Vec<Buckets<C>> = vec![Buckets::new(c); (256 / c) + 1];
let num_threads = multicore::current_num_threads();
if coeffs.len() > num_threads {
multi_buckets
.par_iter_mut()
.enumerate()
.rev()
.map(|(i, buckets)| {
let mut acc = buckets.sum(coeffs, bases, i);
(0..c * i).for_each(|_| acc = acc.double());
acc
})
.the_best_reduce(C::Curve::identity, |a, b| a + b)
.expect("multi_buckets always contains at least 1 bucket")
} else {
multi_buckets
.iter_mut()
.enumerate()
.rev()
.map(|(i, buckets)| buckets.sum(coeffs, bases, i))
.fold(C::Curve::identity(), |mut sum, bucket| {
// restore original evaluation point
(0..c).for_each(|_| sum = sum.double());
sum + bucket
})
}
}

A few invariants worth knowing:

  • The window size ww is chosen as a function of mm. The code uses a heuristic of wlog2m3w \approx \log_2 m - 3 for m32m \geq 32, falling back to w=3w = 3 for tiny inputs.
  • coeffs and bases must have equal length; the function does not check this on the hot path. Length mismatch is a programmer bug.
  • For very small inputs the dispatch falls through to small_multiexp, a naive double-and-add:
halo2_proofs/src/arithmetic.rs
/// Performs a small multi-exponentiation operation.
/// Uses the double-and-add algorithm with doublings shared across points.
pub fn small_multiexp<C: CurveAffine>(coeffs: &[C::Scalar], bases: &[C]) -> C::Curve {
let coeffs: Vec<_> = coeffs.iter().map(|a| a.to_repr()).collect();
let mut acc = C::Curve::identity();

// for byte idx
for byte_idx in (0..32).rev() {
// for bit idx
for bit_idx in (0..8).rev() {
acc = acc.double();
// for each coeff
for coeff_idx in 0..coeffs.len() {
let byte = coeffs[coeff_idx].as_ref()[byte_idx];
if ((byte >> bit_idx) & 1) != 0 {
acc += bases[coeff_idx];
}
}
}
}

acc
}

The 0.3.2 changelog notes "performance has been improved" for best_multiexp; PR #796 is the source of that improvement.

3.3 The FFT

halo2_proofs/src/arithmetic.rs
/// $\omega^{-1}$ in place of $\omega$ and dividing each resulting field element
/// by $n$.
///
/// This will use multithreading if beneficial.
pub fn best_fft<Scalar: Field, G: FftGroup<Scalar>>(a: &mut [G], omega: Scalar, log_n: u32) {
fn bitreverse(mut n: usize, l: usize) -> usize {
let mut r = 0;
for _ in 0..l {
r = (r << 1) | (n & 1);
n >>= 1;
}
r
}

let threads = multicore::current_num_threads();
let log_threads = log2_floor(threads);
let n = a.len();
assert_eq!(n, 1 << log_n);

for k in 0..n {
let rk = bitreverse(k, log_n as usize);
if k < rk {
a.swap(rk, k);
}
}

// precompute twiddle factors
let twiddles: Vec<_> = (0..(n / 2))
.scan(Scalar::ONE, |w, _| {
let tw = *w;
*w *= &omega;
Some(tw)
})
.collect();

if log_n <= log_threads {
let mut chunk = 2_usize;
let mut twiddle_chunk = n / 2;
for _ in 0..log_n {
a.chunks_mut(chunk).for_each(|coeffs| {
let (left, right) = coeffs.split_at_mut(chunk / 2);

// case when twiddle factor is one
let (a, left) = left.split_at_mut(1);
let (b, right) = right.split_at_mut(1);
let t = b[0];
b[0] = a[0];
a[0] += &t;
b[0] -= &t;

left.iter_mut()
.zip(right.iter_mut())
.enumerate()
.for_each(|(i, (a, b))| {
let mut t = *b;
t *= &twiddles[(i + 1) * twiddle_chunk];
*b = *a;
*a += &t;
*b -= &t;
});
});
chunk *= 2;
twiddle_chunk /= 2;
}
} else {
recursive_butterfly_arithmetic(a, n, 1, &twiddles)
}
}

The dispatcher best_fft chooses between a serial recursive implementation and a parallel one based on log_n and the number of rayon threads. The recursion is in recursive_butterfly_arithmetic:

halo2_proofs/src/arithmetic.rs

/// This perform recursive butterfly arithmetic
pub fn recursive_butterfly_arithmetic<Scalar: Field, G: FftGroup<Scalar>>(
a: &mut [G],
n: usize,
twiddle_chunk: usize,
twiddles: &[Scalar],
) {
if n == 2 {
let t = a[1];
a[1] = a[0];
a[0] += &t;
a[1] -= &t;
} else {
let (left, right) = a.split_at_mut(n / 2);
multicore::join(
|| recursive_butterfly_arithmetic(left, n / 2, twiddle_chunk * 2, twiddles),
|| recursive_butterfly_arithmetic(right, n / 2, twiddle_chunk * 2, twiddles),
);

// case when twiddle factor is one
let (a, left) = left.split_at_mut(1);
let (b, right) = right.split_at_mut(1);
let t = b[0];
b[0] = a[0];
a[0] += &t;
b[0] -= &t;

left.iter_mut()
.zip(right.iter_mut())
.enumerate()
.for_each(|(i, (a, b))| {
let mut t = *b;
t *= &twiddles[(i + 1) * twiddle_chunk];
*b = *a;
*a += &t;
*b -= &t;
});
}
}

The slice a is mutated in place; on entry it holds bit-reversed coefficients, on exit it holds evaluations in standard order.

3.4 Parallelization helper

halo2_proofs/src/arithmetic.rs
}

/// This simple utility function will parallelize an operation that is to be
/// performed over a mutable slice.
pub fn parallelize<T: Send, F: Fn(&mut [T], usize) + Send + Sync + Clone>(v: &mut [T], f: F) {
let n = v.len();
let num_threads = multicore::current_num_threads();
let mut chunk = n / num_threads;
if chunk < num_threads {
chunk = n;
}

multicore::scope(|scope| {
for (chunk_num, v) in v.chunks_mut(chunk).enumerate() {
let f = f.clone();
scope.spawn(move |_| {
let start = chunk_num * chunk;
f(v, start);
});
}
});
}

The closure receives a mutable subslice and the global index of its first element. This is the workhorse for almost every parallel loop in the crate. Always prefer it to chunks_mut plus rayon::join because it respects the multicore feature flag.

3.5 Other small but load-bearing helpers

  • eval_polynomial: Horner's method for evaluating iaixi\sum_i a_i x^i.
  • compute_inner_product: iaibi\sum_i a_i b_i, used in the IPA prover.
  • kate_division: divide a polynomial by XbX - b, used inside the multiopen prover.
  • lagrange_interpolate: given (xi,yi)(x_i, y_i) pairs, recover the polynomial. Used in tests and the IPA setup.

4. Failure modes

  • Wrong root of unity. If a caller passes the wrong ω\omega to best_fft, the polynomial in/out conversion silently corrupts and every downstream commitment is wrong. The EvaluationDomain type (chapter 04) exists exactly to avoid this; never call best_fft directly outside poly/domain.rs unless you absolutely know what you are doing.
  • Edge case in best_multiexp for m=0m = 0. The function returns the curve identity. The change in #796 required adding the test test_create_proof precisely to lock that behaviour in.
  • Forgetting --features multicore. Without it, every parallel helper degrades to serial. The CI default has multicore enabled; benchmarks without it can be 8 to 16x slower.
  • Bit-reversal mismatch. The recursive FFT expects bit-reversed input; if you ever call it from a new code path, you must bit-reverse the input first (see how EvaluationDomain does it).

5. Spec pointers

  • The Pasta cycle and the choice of primes are documented in the pasta_curves README and in Daira Hopwood's explainer.
  • The Cooley-Tukey FFT is described in any algorithms textbook; for the field-theoretic NTT view, see chapter 30 of CLRS.
  • Pippenger's algorithm: Pippenger, "On the evaluation of powers and monomials", SIAM J. Comput. (1976). For a modern exposition in the SNARK setting, see Bernstein et al., eprint 2020/1145.

6. Exercises

  1. Open halo2_proofs/src/arithmetic.rs and identify the function that chooses the Pippenger window size. Write down the formula it uses as a function of m.

  2. Find the test at the bottom of halo2_proofs/src/arithmetic.rs that checks best_multiexp against small_multiexp and run it:

    cargo test --release -p halo2_proofs arithmetic
  3. Add an extra #[test] to halo2_proofs/src/arithmetic.rs verifying that best_multiexp on an empty input returns the curve identity. Run the test and confirm it passes.

  4. Run the FFT benchmark:

    cargo bench -p halo2_proofs --bench fft

    Observe how the per-operation cost scales with log_n; the slope should be roughly linear (cost is O(logn)O(\log n) per element).

Answers in the code

  • Exercise 1: the window size selection is at the top of best_multiexp in halo2_proofs/src/arithmetic.rs.
  • Exercise 2: the matching test is named test_best_multiexp (or similar) inside the #[cfg(test)] mod at the bottom of the same file.

7. Further reading

  • The pasta_curves source for the curve implementations themselves.
  • The group and ff traits, which are the abstract interfaces halo2 codes against.