The Rijndael Zerocheck
Here, we explain our optimized prover implementation of our univariate skip variant. Essentially, the problem is to combine Gruen [Gru24, § 5.1] and both Dao–Thaler works [DT24a, DT24b] in a way that lets the prover do as much work as possible in the small Rijndael field. We also use a lookup-based NTT like Hu et al. [Hu+25]'s.
As we've already seen, we use deterministic Rijndael elements as the verifier's first three challenges. These "saturate" the Rijndael field. We further use an "uneven" variant of [DT24b], in which the inner loop—of size just 8—operates entirely over the Rijndael field. Every 8 cube points, the prover embeds into and accumulates.
The Problem
We recall that the prover's goal is to evaluate the polynomial
on at least points outside of .
In this page, we describe a highly optimized implementation, which strategically uses the Rijndael field "where possible". We call our algorithm the Rijndael zerocheck.
The Underlying Arrays
Our prover algorithm below makes use of the constraint arrays , and . As we have seen, there is a straightforward way that the prover can compute these, on input his witness and the constraint system.
Efficient Lookup-Based Extrapolation
Thus far, we have let be an arbitrary -dimensional subspace. Now, we enforce that . We write for a further linear subspace, now of dimension 7 over . We require that too, so we have a chain of inclusions
This condition is not hard to set up (we can construct both subspaces in , and embed them).
Finally, we prepare one more step. Below, we will need the mapping
that extrapolates a bitstring from to . On the other hand, we want to work in the Rijndael field , so we need to inverse-embed each element of the result.
On input a 64-bit word , interprets that word as the values that some univariate polynomial , of degree less than 64, takes on . (I.e., this happens to take -valued values on .) This data specifies uniquely, as a polynomial in . Finally, maps:
Since and is bit-valued on , 's values on themselves live in . I.e., the evaluations for live in the image of , and is well-defined. This means that each output is an element of .
By the way, it would have been equivalent if we had defined as the function that extrapolates a bit-vector from to in . Showing the equivalence of that definition with ours is a short exercise. The way we did it turns out to be easier to work with below.
Since is -linear, we can implement it in the following efficient way, using precomputation. We break the input into 8 length-8 chunks. For each chunk, we precompute a length-256 table containing all 256 evaluations of on input strings that are nonzero just on that chunk. The total data size is thus bits, or KiB. To evaluate on some input , we must do 8 lookups and 7 length-64 bytewise XORs.
The Algorithm
We have already seen that the prover must evaluate on points outside of . Thus it's enough for the prover to evaluate on .
We proceed with the prover's algorithm. The prover's inputs are the constraint arrays , and and the verifier's partially-deterministic challenge , as well as the fixed Rijndael field elements , , , and the precomputed mapping above.
- initialize an empty accumulator array of -elements, say .
- tensor-expand ; i.e., compute the array .
- for each do:
- initialize an all-zero accumulator of 64 -elements (i.e. bytes), say .
- for each do:
- let .
- define , and .
- for each , update .
- for each , update .
- return .
We claim that the array returned by this algorithm contains exactly the values of for , which is what we want.
The above protocol is very efficient. Every 8th iteration, we have to do 64 embeddings and 64 multiplications in . "Between" iterations, we work entirely in . The breakup of our sum into outer and inner loops above can be viewed as a variant of an idea of Dao and Thaler [DT24b] (i.e., separate from the use of deterministic challenges).
Correctness
The correctness of the above algorithm is not completely obvious. It relies on the following nested sum expression (recall also the analysis done here). For each ,
For each pair of outer and inner loop indices and , the arrays , and contain the isomorphic inverses under of , and , respectively, for , by definition of . As of the end of the inner loop, then, the accumulator will contain:
Here, we use the fact that is a ring homomorphism. Upon applying (in the forward direction) componentwise to this vector and then multiplying componentwise, we get:
which is exactly the th summand of the sum expression above.
The Univariate Specialization
As we already saw, after the prover sends , the verifier will sample a single scalar and send it to the prover. The parties' "cubular" sumcheck will be run on
To prepare this sumcheck, the prover needs to prepare the tables of values of the partially specialized -variate multilinears , and on . As we saw above, the tables of values of these latter oblong-multilinears are just the constraint arrays , and . How do the tables of values of , and on relate to , and ?
We have already answered this question in the general context of oblong-multilinearization. In our context, we have . Moreover, for each , the three bit-arrays
are stored as , and . To run the specialization algorithm on , and , the prover just needs to do, for each (in parallel), three size-64 subset sums of , where the respective bit-coefficients come from , and .