Performance
Here, we explore how to understand and optimize circuit performance in Binius64.
Compiler-Driven Optimization
The good news is that most performance optimization happens automatically through compiler passes—you don't need to manually tune every detail. The optimization pipeline handles several key transformations for you:
The compiler performs gate fusion to combine XOR operations into single constraints, constant propagation to evaluate operations on constants at compile time, and dead code elimination to remove unused gates. These passes can substantially reduce your constraint counts automatically, and the optimization pipeline continues to improve with each release.
Cost Model
Understanding operation costs helps you make informed design decisions. Here's how different operations compare, using AND constraints as the baseline:
Some operations are essentially free—XOR, NOT, rotations, and constants have negligible cost because they're encoded as shifted value indices within existing constraints. Basic logical operations like AND, OR, and select each require one constraint. Arithmetic operations vary more: addition and comparison typically require one to two constraints, while multiplication is more expensive at roughly 4 or 5 constraints per operation.
Keep in mind that these ratios evolve as gate implementations improve and new optimization techniques emerge. While constraint counts provide useful complexity estimates, actual proving performance also depends on your circuit structure, parallelization opportunities, and hardware characteristics.
Performance Analysis
To understand your circuit's actual performance, you can examine the circuit snapshots generated after compilation. These .snap
files in crates/examples/snapshots/
provide detailed constraint counts, gate counts, and witness statistics for production circuits.
You can review the included snapshots for SHA256 and Keccak to get a sense of typical constraint counts for standard cryptographic operations. Remember that actual proving performance varies significantly across different hardware architectures and workload characteristics. For concrete proving times and throughput comparisons, see the benchmarks section.