Skip to content

The Rijndael Zerocheck

Our efficient prover implementation of our univariate skip variant

Here, we explain our optimized prover implementation of our univariate skip variant. Essentially, the problem is to combine Gruen [Gru24, § 5.1] and both Dao–Thaler works [DT24a, DT24b] in a way that lets the prover do as much work as possible in the small Rijndael field. We also use a lookup-based NTT like Hu et al. [Hu+25]'s.

As we've already seen, we use deterministic Rijndael elements as the verifier's first three challenges. These "saturate" the Rijndael field. We further use an "uneven" variant of [DT24b], in which the inner loop—of size just 8—operates entirely over the Rijndael field. Every 8 cube points, the prover embeds into F2128\mathbb{F}_{2^{128}} and accumulates.

The Problem

We recall that the prover's goal is to evaluate the polynomial

g(I^)=xBandf(I^,x)eq~(rx,x)=xBand[a^(I^,x)b^(I^,x)c^(I^,x)]eq~(rx,x)\begin{equation*}g(\widehat{I}) = \sum_{x \in \mathcal{B}_{\ell_\text{and}}} f(\widehat{I}, x) \cdot \widetilde{\texttt{eq}}(r_x, x) = \sum_{x \in \mathcal{B}_{\ell_\text{and}}} \left[ \widehat{a}(\widehat{I}, x) \cdot \widehat{b}(\widehat{I}, x) - \widehat{c}(\widehat{I}, x) \right] \cdot \widetilde{\texttt{eq}}(r_x, x)\end{equation*}

on at least 2k12^k - 1 points outside of DD.

In this page, we describe a highly optimized implementation, which strategically uses the Rijndael field "where possible". We call our algorithm the Rijndael zerocheck.

The Underlying Arrays

Our prover algorithm below makes use of the constraint arrays aa, bb and cc. As we have seen, there is a straightforward way that the prover can compute these, on input his witness ww and the constraint system.

Efficient Lookup-Based Extrapolation

Thus far, we have let DF2128D \subset \mathbb{F}_{2^{128}} be an arbitrary 66-dimensional subspace. Now, we enforce that DF28F2128D \subset \mathbb{F}_{2^8} \subset \mathbb{F}_{2^{128}}. We write DF2128D' \subset \mathbb{F}_{2^{128}} for a further linear subspace, now of dimension 7 over F2\mathbb{F}_2. We require that DF2128D' \subset \mathbb{F}_{2^{128}} too, so we have a chain of inclusions

DDF28F2128.\begin{equation*}D \subset D' \subset \mathbb{F}_{2^8} \subset \mathbb{F}_{2^{128}}.\end{equation*}

This condition is not hard to set up (we can construct both subspaces in KK, and embed them).

Finally, we prepare one more step. Below, we will need the mapping

Extrap:F264F2864\begin{equation*}\text{Extrap} : \mathbb{F}_2^{64} \to \mathbb{F}_{2^8}^{64}\end{equation*}

that extrapolates a bitstring from DD to DDD' \setminus D. On the other hand, we want to work in the Rijndael field KK, so we need to inverse-embed each element of the result.

On input a 64-bit word (p0,,p63)(p_0, \ldots , p_{63}), Extrap\text{Extrap} interprets that word as the values that some univariate polynomial P(I^)P(\widehat{I}), of degree less than 64, takes on DF2128D \subset \mathbb{F}_{2^{128}}. (I.e., this P(I^)P(\widehat{I}) happens to take F2\mathbb{F}_2-valued values on DD.) This data specifies P(I^)P(\widehat{I}) uniquely, as a polynomial in F2128[I^]64\mathbb{F}_{2^{128}}[\widehat{I}]^{\prec 64}. Finally, Extrap\text{Extrap} maps:

Extrap:(p0,,p63)(ι1(P(i^)))i^DD.\begin{equation*}\text{Extrap} : (p_0, \ldots , p_{63}) \mapsto \Big( \iota^{-1}(P(\widehat{i})) \Big)_{\widehat{i} \in D' \setminus D}.\end{equation*}

Since DF28F2128D' \subset \mathbb{F}_{2^{8}} \subset \mathbb{F}_{2^{128}} and P(I^)P(\widehat{I}) is bit-valued on DD, P(I^)P(\widehat{I})'s values on DD' themselves live in F28F2128\mathbb{F}_{2^{8}} \subset \mathbb{F}_{2^{128}}. I.e., the evaluations P(i^)P(\widehat{i}) for i^DD\widehat{i} \in D' \setminus D live in the image of ι\iota, and ι1(P(i^))\iota^{-1}(P(\widehat{i})) is well-defined. This means that each output Extrap(p0,,p63)\text{Extrap}(p_0, \ldots , p_{63}) is an element of K64F2864K^{64} \cong \mathbb{F}_{2^8}^{64}.

By the way, it would have been equivalent if we had defined Extrap\text{Extrap} as the function that extrapolates a bit-vector (p0,,p63)(p_0, \ldots , p_{63}) from ι1(D)\iota^{-1}(D) to ι1(DD)\iota^{-1}(D' \setminus D) in KK. Showing the equivalence of that definition with ours is a short exercise. The way we did it turns out to be easier to work with below.

Since Extrap\text{Extrap} is F2\mathbb{F}_2-linear, we can implement it in the following efficient way, using precomputation. We break the input (p0,,p63)(p_0, \ldots , p_{63}) into 8 length-8 chunks. For each chunk, we precompute a length-256 table containing all 256 evaluations of Extrap\text{Extrap} on input strings that are nonzero just on that chunk. The total data size is thus 82566488 \cdot 256 \cdot 64 \cdot 8 bits, or 128128 KiB. To evaluate Extrap\text{Extrap} on some input (p0,,p63)(p_0, \ldots , p_{63}), we must do 8 lookups and 7 length-64 bytewise XORs.

The Algorithm

We have already seen that the prover must evaluate g(I^)g(\widehat{I}) on 2k12^k - 1 points outside of DD. Thus it's enough for the prover to evaluate g(I^)g(\widehat{I}) on DDD' \setminus D.

We proceed with the prover's algorithm. The prover's inputs are the constraint arrays aa, bb and cc and the verifier's partially-deterministic challenge rx=(σ0,σ1,σ2,,rx,0,,rx,and4)\overline{r_x} = (\sigma_0, \sigma_1, \sigma_2, \ldots , \overline{r_{x, 0}}, \ldots , \overline{r_{x, \ell_\text{and} - 4}}), as well as the fixed Rijndael field elements ρ0\rho_0, ρ1\rho_1, ρ2\rho_2, and the precomputed mapping Extrap\text{Extrap} above.

  • initialize an empty accumulator array of 6464 F2128\mathbb{F}_{2^{128}}-elements, say g\mathsf{g}.
  • tensor-expand rx\overline{r_x}; i.e., compute the array (eq~(rx,v))vBand3\left( \widetilde{\texttt{eq}}(\overline{r_x}, v) \right)_{v \in \mathcal{B}_{\ell_\text{and}} - 3}.
  • for each vBand3v \in \mathcal{B}_{\ell_\text{and} - 3} do:
    • initialize an all-zero accumulator of 64 KK-elements (i.e. bytes), say temp\textsf{temp}.
    • for each uB3u \in \mathcal{B}_3 do:
      • let xuvx \coloneqq u \mathbin{\Vert} v.
      • define aExtrap(a[x])\mathsf{a} \coloneqq \text{Extrap}(a[x]), bExtrap(b[x])\mathsf{b} \coloneqq \text{Extrap}(b[x]) and cExtrap(c[x])\mathsf{c} \coloneqq \text{Extrap}(c[x]).
      • for each i{0,,63}i \in \{0, \ldots , 63\}, update temp[i]+=(a[i]b[i]c[i])eq~(ρ,u)\mathsf{temp}[i] \mathrel{+}= \left( \mathsf{a}[i] \cdot \mathsf{b}[i] - \mathsf{c}[i] \right) \cdot \widetilde{\texttt{eq}}(\rho, u).
    • for each i{0,,63}i \in \{0, \ldots , 63\}, update g+=ι(temp[i])eq~(rx,v)\mathsf{g} \mathrel{+}= \iota(\mathsf{temp}[i]) \cdot \widetilde{\texttt{eq}}(\overline{r_x}, v).
  • return g\mathsf{g}.

We claim that the array g\mathsf{g} returned by this algorithm contains exactly the values of g(i^)g(\widehat{i}) for i^DD\widehat{i} \in D' \setminus D, which is what we want.

The above protocol is very efficient. Every 8th iteration, we have to do 64 KF2128K \to \mathbb{F}_{2^{128}} embeddings and 64 multiplications in F2128\mathbb{F}_{2^{128}}. "Between" iterations, we work entirely in KK. The breakup of our sum into outer and inner loops above can be viewed as a variant of an idea of Dao and Thaler [DT24b] (i.e., separate from the use of deterministic challenges).

Correctness

The correctness of the above algorithm is not completely obvious. It relies on the following nested sum expression (recall also the analysis done here). For each i^DD\widehat{i} \in D' \setminus D,

xBandf(i^,x)eq~(rx,x)=vBand3[uB3f(i^,uv)eq~(σ,u)]eq~(rx,v).\begin{equation*}\sum_{x \in \mathcal{B}_{\ell_\text{and}}} f(\widehat{i}, x) \cdot \widetilde{\texttt{eq}}(r_x, x) = \sum_{v \in \mathcal{B}_{\ell_\text{and}} - 3} \left[ \sum_{u \in \mathcal{B}_3} f(\widehat{i}, u \mathbin{\Vert} v) \cdot \widetilde{\texttt{eq}}(\sigma, u) \right] \cdot \widetilde{\texttt{eq}}(\overline{r_x}, v).\end{equation*}

For each pair of outer and inner loop indices vBand3v \in \mathcal{B}_{\ell_\text{and} - 3} and uB3u \in \mathcal{B}_3, the arrays a\mathsf{a}, b\mathsf{b} and c\mathsf{c} contain the isomorphic inverses under ι\iota of a^(i^,x)\widehat{a}(\widehat{i}, x), b^(i^,x)\widehat{b}(\widehat{i}, x) and c^(i^,x)\widehat{c}(\widehat{i}, x), respectively, for i^DD\widehat{i} \in D' \setminus D, by definition of Extrap\text{Extrap}. As of the end of the inner loop, then, the accumulator temp\mathsf{temp} will contain:

(uB3ι1(a^(i^,uv)b^(i^,uv)c^(i^,uv))eq~(ρ,u))i^DD.\begin{equation*}\left( \sum_{u \in \mathcal{B}_3} \iota^{-1}\left( \widehat{a}(\widehat{i}, u \mathbin{\Vert} v) \cdot \widehat{b}(\widehat{i}, u \mathbin{\Vert} v) - \widehat{c}(\widehat{i}, u \mathbin{\Vert} v) \right) \cdot \widetilde{\texttt{eq}}(\rho, u) \right)_{\widehat{i} \in D' \setminus D}.\end{equation*}

Here, we use the fact that ι1:F28K\iota^{-1} : \mathbb{F}_{2^8} \to K is a ring homomorphism. Upon applying ι\iota (in the forward direction) componentwise to this vector and then multiplying eq(rx,v){\texttt{eq}}(\overline{r_x}, v) componentwise, we get:

((uB3(a^(i^,uv)b^(i^,uv)c^(i^,uv))eq~(σ,u))eq~(rx,v))i^DD,\begin{equation*}\left( \Big( \sum_{u \in \mathcal{B}_3} \left( \widehat{a}(\widehat{i}, u \mathbin{\Vert} v) \cdot \widehat{b}(\widehat{i}, u \mathbin{\Vert} v) - \widehat{c}(\widehat{i}, u \mathbin{\Vert} v) \right) \cdot \widetilde{\texttt{eq}}(\sigma, u) \Big) \cdot \widetilde{\texttt{eq}}(\overline{r_x}, v) \right)_{\widehat{i} \in D' \setminus D},\end{equation*}

which is exactly the vvth summand of the sum expression above.

The Univariate Specialization

As we already saw, after the prover sends g(I^)g(\widehat{I}), the verifier will sample a single scalar ri^F2128r_{\widehat{i}} \gets \mathbb{F}_{2^{128}} and send it to the prover. The parties' "cubular" sumcheck will be run on

s0=?xBandf(ri^,x)eq~(rx,x)=xBand[a^(ri^,x)b^(ri^,x)c^(ri^,x)]eq~(rx,x).\begin{equation*}s_0 \stackrel{?}= \sum_{x \in \mathcal{B}_{\ell_\text{and}}} f(r_{\widehat{i}}, x) \cdot \widetilde{\texttt{eq}}(r_x, x) = \sum_{x \in \mathcal{B}_{\ell_\text{and}}} \left[ \widehat{a}(r_{\widehat{i}}, x) \cdot \widehat{b}(r_{\widehat{i}}, x) - \widehat{c}(r_{\widehat{i}}, x) \right] \cdot \widetilde{\texttt{eq}}(r_x, x).\end{equation*}

To prepare this sumcheck, the prover needs to prepare the tables of values of the partially specialized and\ell_\text{and}-variate multilinears a^(ri^,X)\widehat{a}(r_{\widehat{i}}, X), b^(ri^,X)\widehat{b}(r_{\widehat{i}}, X) and c^(ri^,X)\widehat{c}(r_{\widehat{i}}, X) on Band\mathcal{B}_{\ell_\text{and}}. As we saw above, the tables of values of these latter oblong-multilinears are just the constraint arrays aa, bb and cc. How do the tables of values of a^(ri^,X)\widehat{a}(r'_{\widehat{i}}, X), b^(ri^,X)\widehat{b}(r'_{\widehat{i}}, X) and c^(ri^,X)\widehat{c}(r'_{\widehat{i}}, X) on Band\mathcal{B}_{\ell_\text{and}} relate to aa, bb and cc?

We have already answered this question in the general context of oblong-multilinearization. In our context, we have k=6k = 6. Moreover, for each xBandx \in \mathcal{B}_{\ell_\text{and}}, the three bit-arrays

(a~(i0,,i5,x0,,xand1))iB6,(b~(i0,,i5,x0,,xand1))iB6,(c~(i0,,i5,x0,,xand1))iB6\begin{gather*} \left( \widetilde{a}(i_0, \ldots , i_5, x_0, \ldots , x_{\ell_\text{and} - 1}) \right)_{i \in \mathcal{B}_6}, \\ \left( \widetilde{b}(i_0, \ldots , i_5, x_0, \ldots , x_{\ell_\text{and} - 1}) \right)_{i \in \mathcal{B}_6}, \\ \left( \widetilde{c}(i_0, \ldots , i_5, x_0, \ldots , x_{\ell_\text{and} - 1}) \right)_{i \in \mathcal{B}_6} \end{gather*}

are stored as a[x]a[x], b[x]b[x] and c[x]c[x]. To run the specialization algorithm on aa, bb and cc, the prover just needs to do, for each x{0,,nand1}x \in \{0, \ldots , n_\text{and} - 1\} (in parallel), three size-64 subset sums of (δD(ri^,i^))iB6\left( \delta_D(r'_{\widehat{i}}, \widehat{i}) \right)_{i \in \mathcal{B}_6}, where the respective bit-coefficients come from a[x]a[x], b[x]b[x] and c[x]c[x].