The Rijndael Zerocheck

Our efficient prover implementation of our univariate skip variant

Here, we explain our optimized prover implementation of our univariate skip variant. Essentially, the problem is to combine Gruen [Gru24, § 5.1] and both Dao–Thaler works [DT24a, DT24b] in a way that lets the prover do as much work as possible in the small Rijndael field. We also use a lookup-based NTT like Hu et al. [Hu+25]'s.

As we've already seen, we use deterministic Rijndael elements as the verifier's first three challenges. These "saturate" the Rijndael field. We further use an "uneven" variant of [DT24b], in which the inner loop—of size just 8—operates entirely over the Rijndael field. Every 8 cube points, the prover embeds into $\mathbb{F}_{2^{128}}$ and accumulates.

The Problem

We recall that the prover's goal is to evaluate the polynomial

\begin{equation*}g(\widehat{I}) = \sum_{x \in \mathcal{B}_{\ell_\text{and}}} f(\widehat{I}, x) \cdot \widetilde{\texttt{eq}}(r_x, x) = \sum_{x \in \mathcal{B}_{\ell_\text{and}}} \left[ \widehat{a}(\widehat{I}, x) \cdot \widehat{b}(\widehat{I}, x) - \widehat{c}(\widehat{I}, x) \right] \cdot \widetilde{\texttt{eq}}(r_x, x)\end{equation*}

on at least $2^k - 1$ points outside of $D$ .

In this page, we describe a highly optimized implementation, which strategically uses the Rijndael field "where possible". We call our algorithm the Rijndael zerocheck.

The Underlying Arrays

Our prover algorithm below makes use of the constraint arrays $a$ , $b$ and $c$ . As we have seen, there is a straightforward way that the prover can compute these, on input his witness $w$ and the constraint system.

Efficient Lookup-Based Extrapolation

Thus far, we have let $D \subset \mathbb{F}_{2^{128}}$ be an arbitrary $6$ -dimensional subspace. Now, we enforce that $D \subset \mathbb{F}_{2^8} \subset \mathbb{F}_{2^{128}}$ . We write $D' \subset \mathbb{F}_{2^{128}}$ for a further linear subspace, now of dimension 7 over $\mathbb{F}_2$ . We require that $D' \subset \mathbb{F}_{2^{128}}$ too, so we have a chain of inclusions

\begin{equation*}D \subset D' \subset \mathbb{F}_{2^8} \subset \mathbb{F}_{2^{128}}.\end{equation*}

This condition is not hard to set up (we can construct both subspaces in $K$ , and embed them).

Finally, we prepare one more step. Below, we will need the mapping

\begin{equation*}\text{Extrap} : \mathbb{F}_2^{64} \to \mathbb{F}_{2^8}^{64}\end{equation*}

that extrapolates a bitstring from $D$ to $D' \setminus D$ . On the other hand, we want to work in the Rijndael field $K$ , so we need to inverse-embed each element of the result.

On input a 64-bit word $(p_0, \ldots , p_{63})$ , $\text{Extrap}$ interprets that word as the values that some univariate polynomial $P(\widehat{I})$ , of degree less than 64, takes on $D \subset \mathbb{F}_{2^{128}}$ . (I.e., this $P(\widehat{I})$ happens to take $\mathbb{F}_2$ -valued values on $D$ .) This data specifies $P(\widehat{I})$ uniquely, as a polynomial in $\mathbb{F}_{2^{128}}[\widehat{I}]^{\prec 64}$ . Finally, $\text{Extrap}$ maps:

\begin{equation*}\text{Extrap} : (p_0, \ldots , p_{63}) \mapsto \Big( \iota^{-1}(P(\widehat{i})) \Big)_{\widehat{i} \in D' \setminus D}.\end{equation*}

Since $D' \subset \mathbb{F}_{2^{8}} \subset \mathbb{F}_{2^{128}}$ and $P(\widehat{I})$ is bit-valued on $D$ , $P(\widehat{I})$ 's values on $D'$ themselves live in $\mathbb{F}_{2^{8}} \subset \mathbb{F}_{2^{128}}$ . I.e., the evaluations $P(\widehat{i})$ for $\widehat{i} \in D' \setminus D$ live in the image of $\iota$ , and $\iota^{-1}(P(\widehat{i}))$ is well-defined. This means that each output $\text{Extrap}(p_0, \ldots , p_{63})$ is an element of $K^{64} \cong \mathbb{F}_{2^8}^{64}$ .

By the way, it would have been equivalent if we had defined $\text{Extrap}$ as the function that extrapolates a bit-vector $(p_0, \ldots , p_{63})$ from $\iota^{-1}(D)$ to $\iota^{-1}(D' \setminus D)$ in $K$ . Showing the equivalence of that definition with ours is a short exercise. The way we did it turns out to be easier to work with below.

Since $\text{Extrap}$ is $\mathbb{F}_2$ -linear, we can implement it in the following efficient way, using precomputation. We break the input $(p_0, \ldots , p_{63})$ into 8 length-8 chunks. For each chunk, we precompute a length-256 table containing all 256 evaluations of $\text{Extrap}$ on input strings that are nonzero just on that chunk. The total data size is thus $8 \cdot 256 \cdot 64 \cdot 8$ bits, or $128$ KiB. To evaluate $\text{Extrap}$ on some input $(p_0, \ldots , p_{63})$ , we must do 8 lookups and 7 length-64 bytewise XORs.

The Algorithm

We have already seen that the prover must evaluate $g(\widehat{I})$ on $2^k - 1$ points outside of $D$ . Thus it's enough for the prover to evaluate $g(\widehat{I})$ on $D' \setminus D$ .

We proceed with the prover's algorithm. The prover's inputs are the constraint arrays $a$ , $b$ and $c$ and the verifier's partially-deterministic challenge $\overline{r_x} = (\sigma_0, \sigma_1, \sigma_2, \ldots , \overline{r_{x, 0}}, \ldots , \overline{r_{x, \ell_\text{and} - 4}})$ , as well as the fixed Rijndael field elements $\rho_0$ , $\rho_1$ , $\rho_2$ , and the precomputed mapping $\text{Extrap}$ above.

initialize an empty accumulator array of $64$ $\mathbb{F}_{2^{128}}$ -elements, say $\mathsf{g}$ .
tensor-expand $\overline{r_x}$ ; i.e., compute the array $\left( \widetilde{\texttt{eq}}(\overline{r_x}, v) \right)_{v \in \mathcal{B}_{\ell_\text{and}} - 3}$ .
for each $v \in \mathcal{B}_{\ell_\text{and} - 3}$ $v \in B_{ℓ_{and} - 3}$ do:
- initialize an all-zero accumulator of 64 $K$ -elements (i.e. bytes), say $\textsf{temp}$ .
- for each $u \in \mathcal{B}_3$ $u \in B_{3}$ do:
  - let $x \coloneqq u \mathbin{\Vert} v$ .
  - define $\mathsf{a} \coloneqq \text{Extrap}(a[x])$ , $\mathsf{b} \coloneqq \text{Extrap}(b[x])$ and $\mathsf{c} \coloneqq \text{Extrap}(c[x])$ .
  - for each $i \in \{0, \ldots , 63\}$ , update $\mathsf{temp}[i] \mathrel{+}= \left( \mathsf{a}[i] \cdot \mathsf{b}[i] - \mathsf{c}[i] \right) \cdot \widetilde{\texttt{eq}}(\rho, u)$ .
- for each $i \in \{0, \ldots , 63\}$ , update $\mathsf{g} \mathrel{+}= \iota(\mathsf{temp}[i]) \cdot \widetilde{\texttt{eq}}(\overline{r_x}, v)$ .
return $\mathsf{g}$ .

We claim that the array $\mathsf{g}$ returned by this algorithm contains exactly the values of $g(\widehat{i})$ for $\widehat{i} \in D' \setminus D$ , which is what we want.

The above protocol is very efficient. Every 8^th iteration, we have to do 64 $K \to \mathbb{F}_{2^{128}}$ embeddings and 64 multiplications in $\mathbb{F}_{2^{128}}$ . "Between" iterations, we work entirely in $K$ . The breakup of our sum into outer and inner loops above can be viewed as a variant of an idea of Dao and Thaler [DT24b] (i.e., separate from the use of deterministic challenges).

Correctness

The correctness of the above algorithm is not completely obvious. It relies on the following nested sum expression (recall also the analysis done here). For each $\widehat{i} \in D' \setminus D$ ,

\begin{equation*}\sum_{x \in \mathcal{B}_{\ell_\text{and}}} f(\widehat{i}, x) \cdot \widetilde{\texttt{eq}}(r_x, x) = \sum_{v \in \mathcal{B}_{\ell_\text{and}} - 3} \left[ \sum_{u \in \mathcal{B}_3} f(\widehat{i}, u \mathbin{\Vert} v) \cdot \widetilde{\texttt{eq}}(\sigma, u) \right] \cdot \widetilde{\texttt{eq}}(\overline{r_x}, v).\end{equation*}

For each pair of outer and inner loop indices $v \in \mathcal{B}_{\ell_\text{and} - 3}$ and $u \in \mathcal{B}_3$ , the arrays $\mathsf{a}$ , $\mathsf{b}$ and $\mathsf{c}$ contain the isomorphic inverses under $\iota$ of $\widehat{a}(\widehat{i}, x)$ , $\widehat{b}(\widehat{i}, x)$ and $\widehat{c}(\widehat{i}, x)$ , respectively, for $\widehat{i} \in D' \setminus D$ , by definition of $\text{Extrap}$ . As of the end of the inner loop, then, the accumulator $\mathsf{temp}$ will contain:

\begin{equation*}\left( \sum_{u \in \mathcal{B}_3} \iota^{-1}\left( \widehat{a}(\widehat{i}, u \mathbin{\Vert} v) \cdot \widehat{b}(\widehat{i}, u \mathbin{\Vert} v) - \widehat{c}(\widehat{i}, u \mathbin{\Vert} v) \right) \cdot \widetilde{\texttt{eq}}(\rho, u) \right)_{\widehat{i} \in D' \setminus D}.\end{equation*}

Here, we use the fact that $\iota^{-1} : \mathbb{F}_{2^8} \to K$ is a ring homomorphism. Upon applying $\iota$ (in the forward direction) componentwise to this vector and then multiplying ${\texttt{eq}}(\overline{r_x}, v)$ componentwise, we get:

\begin{equation*}\left( \Big( \sum_{u \in \mathcal{B}_3} \left( \widehat{a}(\widehat{i}, u \mathbin{\Vert} v) \cdot \widehat{b}(\widehat{i}, u \mathbin{\Vert} v) - \widehat{c}(\widehat{i}, u \mathbin{\Vert} v) \right) \cdot \widetilde{\texttt{eq}}(\sigma, u) \Big) \cdot \widetilde{\texttt{eq}}(\overline{r_x}, v) \right)_{\widehat{i} \in D' \setminus D},\end{equation*}

which is exactly the $v$ ^th summand of the sum expression above.

The Univariate Specialization

As we already saw, after the prover sends $g(\widehat{I})$ , the verifier will sample a single scalar $r_{\widehat{i}} \gets \mathbb{F}_{2^{128}}$ and send it to the prover. The parties' "cubular" sumcheck will be run on

\begin{equation*}s_0 \stackrel{?}= \sum_{x \in \mathcal{B}_{\ell_\text{and}}} f(r_{\widehat{i}}, x) \cdot \widetilde{\texttt{eq}}(r_x, x) = \sum_{x \in \mathcal{B}_{\ell_\text{and}}} \left[ \widehat{a}(r_{\widehat{i}}, x) \cdot \widehat{b}(r_{\widehat{i}}, x) - \widehat{c}(r_{\widehat{i}}, x) \right] \cdot \widetilde{\texttt{eq}}(r_x, x).\end{equation*}

To prepare this sumcheck, the prover needs to prepare the tables of values of the partially specialized $\ell_\text{and}$ -variate multilinears $\widehat{a}(r_{\widehat{i}}, X)$ , $\widehat{b}(r_{\widehat{i}}, X)$ and $\widehat{c}(r_{\widehat{i}}, X)$ on $\mathcal{B}_{\ell_\text{and}}$ . As we saw above, the tables of values of these latter oblong-multilinears are just the constraint arrays $a$ , $b$ and $c$ . How do the tables of values of $\widehat{a}(r'_{\widehat{i}}, X)$ , $\widehat{b}(r'_{\widehat{i}}, X)$ and $\widehat{c}(r'_{\widehat{i}}, X)$ on $\mathcal{B}_{\ell_\text{and}}$ relate to $a$ , $b$ and $c$ ?

We have already answered this question in the general context of oblong-multilinearization. In our context, we have $k = 6$ . Moreover, for each $x \in \mathcal{B}_{\ell_\text{and}}$ , the three bit-arrays

\begin{gather*} \left( \widetilde{a}(i_0, \ldots , i_5, x_0, \ldots , x_{\ell_\text{and} - 1}) \right)_{i \in \mathcal{B}_6}, \\ \left( \widetilde{b}(i_0, \ldots , i_5, x_0, \ldots , x_{\ell_\text{and} - 1}) \right)_{i \in \mathcal{B}_6}, \\ \left( \widetilde{c}(i_0, \ldots , i_5, x_0, \ldots , x_{\ell_\text{and} - 1}) \right)_{i \in \mathcal{B}_6} \end{gather*}

are stored as $a[x]$ , $b[x]$ and $c[x]$ . To run the specialization algorithm on $a$ , $b$ and $c$ , the prover just needs to do, for each $x \in \{0, \ldots , n_\text{and} - 1\}$ (in parallel), three size-64 subset sums of $\left( \delta_D(r'_{\widehat{i}}, \widehat{i}) \right)_{i \in \mathcal{B}_6}$ , where the respective bit-coefficients come from $a[x]$ , $b[x]$ and $c[x]$ .