Implement Small-Value Sum-Check Optimization (Algorithm 6) #98

wu-s-john · 2025-12-18T23:02:41Z

Implement Small-Value Sum-Check Optimization (Algorithm 6)

Summary

This PR implements Algorithm 6 ("Small-Value Sum-Check with Eq-Poly Optimization") from the paper "Speeding Up Sum-Check Proving" by Bagad, Dao, Domb, and Thaler. The optimization targets Spartan's first sum-check invocation where witness polynomial evaluations are small integers (fitting in i32/i64), enabling significant prover speedups by replacing expensive field multiplications with cheaper native integer operations.

Key Insight

In the sum-check protocol, round 1 computations involve only small values (the original witness evaluations). From round 2 onward, evaluations become "large" due to binding to random verifier challenges. Algorithm 6 delays this binding using Lagrange interpolation, computing accumulators over small values in the first ℓ₀ rounds before switching to the standard linear-time prover.

Multiplication Cost Hierarchy:

ss (small × small): Native i32/i64 multiplication (~1 cycle)
sl (small × large): Barrett-optimized multiplication (~9 base mults)
ll (large × large): Full Montgomery multiplication (~32 base mults)

For Spartan with degree-2 polynomials, Algorithm 6 reduces ll multiplications from O(N) to O(N/2^ℓ₀) at the cost of O((3/2)^ℓ₀ · N) ss multiplications.

Benchmarks

Measured on M1 Max MacBook Pro (10 cores, 64GB RAM) with jemalloc.
Note: halo2curves/asm is not enabled (unavailable on Apple Silicon).

cargo run --release --example sumcheck_sweep --features jem

num_vars	n	original (µs)	small-value (µs)	speedup
10	1,024	1,275	1,166	1.09×
12	4,096	1,575	1,322	1.19×
14	16,384	2,315	2,016	1.15×
16	65,536	4,922	3,847	1.28×
18	262,144	15,087	10,480	1.44×
20	1,048,576	46,783	28,491	1.64×
22	4,194,304	163,593	105,487	1.55×
24	16,777,216	658,282	439,680	1.50×

Key observations:

Speedup increases with problem size, peaking at 1.64× for n = 2²⁰
Consistent 1.5× speedup for large instances (n ≥ 2²²)
Small instances show modest gains due to fixed overhead of accumulator precomputation

Delayed Modular Reduction (i32 vs i64)

Benchmarks comparing i32 and i64 small value types with delayed modular reduction:

MAX_VARS=26 cargo run --release --example sumcheck_sweep

num_vars	n	original (µs)	i32 small (µs)	i64 small (µs)	orig vs i32	orig vs i64
10	1,024	1,293	1,116	1,059	1.16×	1.22×
12	4,096	1,644	1,597	1,582	1.03×	1.04×
14	16,384	2,556	2,277	2,110	1.12×	1.21×
16	65,536	5,363	4,027	4,193	1.33×	1.28×
18	262,144	15,246	9,260	9,314	1.65×	1.64×
20	1,048,576	47,629	27,180	29,157	1.75×	1.63×
22	4,194,304	181,004	103,033	102,690	1.76×	1.76×
24	16,777,216	700,294	415,251	441,350	1.69×	1.59×
26	67,108,864	2,943,038	1,764,047	1,789,063	1.67×	1.65×

Key observations:

Both i32 and i64 variants achieve similar speedups (~1.65-1.76×) for large instances
i32 is slightly faster for n ≥ 2¹⁶ (narrower loads/stores)
i64 shows marginal advantage for small instances (n ≤ 2¹⁴)
Peak speedup of 1.76× at n = 2²² for both variants

SHA-256 Chain Benchmark

To demonstrate real-world applicability, we benchmark proving SHA-256 hash chains. This workload approximates a major component of Solana light client verification.

cargo run --release --no-default-features --example sha256_chain_benchmark

chain_length	num_vars	log₂(constraints)	num_constraints	witness_ms	orig_sumcheck_ms	small_sumcheck_ms	total_ms	speedup	witness_pct
2	16	16	65,536	14	5	3	20	1.67×	70.0%
8	18	18	262,144	55	16	11	75	1.45×	73.3%
32	20	20	1,048,576	229	48	32	301	1.50×	76.1%
128	22	22	4,194,304	1,260	163	109	1,547	1.50×	81.4%
512	24	24	16,777,216	5,686	609	395	6,743	1.54×	84.3%
2048	26	26	67,108,864	17,015	2,857	1,677	22,116	1.70×	76.9%

Key observations:

2048 SHA-256 hashes proven in ~22 seconds
Witness generation dominates at 70-84% of total proving time
Small-value sumcheck achieves consistent 1.45-1.70× speedup

Solana Light Client Comparison

A Solana light client verifying block finality requires:

Component	Hash Function	Count
Vote signature verification	SHA-512 (Ed25519 internal)	~21 to ~1,588
Merkle shred verification	SHA-256	~108 to ~1,206

Ed25519 uses SHA-512 internally for challenge hashing
Finality requires ≥2/3 supermajority stake (~21-530 validators)
SHA-512 is ~1.5-2× more expensive than SHA-256 per hash

SHA-256 equivalent cost:

Solana SHA-256: ~1,206 hashes
Solana SHA-512: ~1,588 × 1.5-2 = ~2,382-3,176 SHA-256 equivalent
Total: ~3,588-4,382 SHA-256 equivalent
Our 2048-chain benchmark covers ~47-57% of Solana's worst-case proving requirement

Implementation

Core Components

SmallValueField trait (src/small_field.rs)
- Defines SmallValue (i32) and IntermediateSmallValue (i64) types
- Barrett-optimized sl_mul and isl_mul for BN254/BLS12-381 (~3× faster than ll)
- Overflow analysis ensuring correctness for typical witness bounds
Lagrange Domain Extension (src/lagrange.rs)
- LagrangeEvaluatedMultilinearPolynomial<T, D> for extending boolean evaluations to U_d = {∞, 0, 1, ..., d-1}
- Zero-allocation extend_in_place with ping-pong buffers
- gather_prefix_evals for efficient prefix collection (Procedure 6)
Accumulator Data Structures (src/accumulators.rs, src/accumulator_index.rs)
- SmallValueAccumulators<S, D> storing A_i(v, u) with O(1) indexing via UdTuple
- idx4 mapping (Definition A.5) for distributing products to correct accumulators
- Type-safe UdEvaluations and UdHatEvaluations wrappers
Procedure 9 Implementation (src/accumulators.rs)
- build_accumulators_spartan: Optimized for Spartan's Az·Bz structure
- build_accumulators: Generic version for arbitrary polynomial products
- Parallel fold-reduce with thread-local scratch buffers
Thread-Local Buffer Reuse (src/thread_state_accumulators.rs)
- SpartanThreadState and GenericThreadState eliminate O(num_x_out) allocations
- Reduces allocator contention in parallel workloads
Sum-Check Integration (src/sumcheck.rs)
- SmallValueSumCheck::from_accumulators factory method
- Round-by-round Lagrange coefficient multiplication (R_{i+1} = R_i ⊗ L_{U_d}(r_i))

Algorithm Flow

┌─────────────────────────────────────────────────────────────────────────┐
│  Precomputation: Build accumulators A_i(v, u) for i ∈ [ℓ₀]              │
│                                                                         │
│  For each x_out ∈ {0,1}^{ℓ/2-ℓ₀}:                                       │
│    For each x_in ∈ {0,1}^{ℓ/2}:                                         │
│      ein = eq(w_R, x_in) · eq(w_L, x_out)                              │
│      Extend Az/Bz prefixes to U_d^{ℓ₀} via Lagrange                    │
│      Accumulate products weighted by ein into A_i(v, u)                │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│  Rounds 1..ℓ₀: Compute s_i(X) = ⟨R_i, A_i(·, u)⟩ for u ∈ Û_d           │
│                R_{i+1} = R_i ⊗ (L_{U_d,k}(r_i))_{k∈U_d}                 │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│  Round ℓ₀+1: Streaming round (Algorithm 2) to bind to r_{1:ℓ₀}         │
│  Rounds ℓ₀+2..ℓ: Standard linear-time sum-check (Algorithm 1)          │
└─────────────────────────────────────────────────────────────────────────┘

Test Plan

cargo test test_build_accumulators - Verifies accumulator construction
cargo test test_small_value - SmallValueField arithmetic correctness
cargo test lagrange - Lagrange extension and interpolation
cargo test sumcheck - Full sum-check protocol equivalence
cargo clippy - No warnings
examples/sumcheck_sha256_equivalence.rs - Verifies new method produces identical proofs to baseline
examples/sha256_chain_benchmark.rs - SHA-256 chain proving with CSV output

References

Paper: Speeding Up Sum-Check Proving (ePrint 2024/1046)
Jolt integration: a16z/jolt#690

Introduce UdPoint, UdHatPoint, UdTuple, and ValueOneExcluded types in src/lagrange.rs for representing evaluation domains U_d and Û_d used in the small-value sumcheck optimization.

Implements LagrangeEvaluatedMultilinearPolynomial with from_multilinear() factory method that extends evaluations from {0,1}^n to U_d^n.

sumcheck optimization (Algorithm 6) Introduces RoundAccumulator and SmallValueAccumulators for the small-value sumcheck optimization. Uses flat Vec<[Scalar; D]> storage with const generic D for cache efficiency and vectorizable merge operations in parallel fold-reduce.

Parameterize UdPoint, UdHatPoint, UdTuple, and LagrangeEvaluatedMultilinearPolynomial with const generic D to enable: - Compile-time enforcement that domain types match accumulator degree - Debug assertions for bounds checking (v < D in constructors) - Elimination of runtime base parameter from to_flat_index() This prevents mixing domain sizes at compile time and catches out-of-bounds errors in debug builds.

Implement AccumulatorPrefixIndex and compute_idx4() which maps evaluation prefixes β ∈ U_d^ℓ₀ to accumulator contributions by decomposing β into prefix v, coordinate u ∈ Û_d, and binary suffix y.

Extracts strided polynomial evaluations for all binary prefixes b ∈ {0,1}^ℓ₀ given a fixed suffix, bridging full polynomials to Procedure 6 (Lagrange extension).

Added a parallel build_accumulators that binds suffixes, extends prefixes to the Ud domain, applies the ∞/Cz rule, and routes contributions via cached idx4 with E_in/E_out weighting. Expanded accumulator tests with a naive cross-check, ∞ handling, and binary-β zero behavior to validate correctness. Cleaned up dead-code allowances now that the code paths are used.

Added explicit MSB-first checks for eq table generation, gather_prefix_evals stride/pattern, and bind_poly_var_top to ensure “top” binds the MSB.These tests catch silent index/order regressions across components.

Compute ℓ_i(X) = eqe(w[<i], r[<i]) · eqe(w_i, X) values for sum-check rounds. Compute ℓ_i(0)=α_i(1−w_i), ℓ_i(1)=α_i w_i, ℓ_i(∞)=α_i(2w_i−1) for sum-check rounds

Replace range-indexed loops and a redundant closure with iterator forms

Add eq-round linear factor utilities and accumulator evaluation to derive t_i and build s_i polynomials.

Track R_i and ℓ_i state to compare accumulator evals with EqSumCheckInstance rounds.

indexing Switch Spartan t_i to D=2 aliases/tests, precompute idx4 prefix/suffix data, and flatten accumulator caches to cut allocations.

Csr (Compressed Sparse Row) stores variable-length lists with 2 allocations instead of N+1, improving cache locality. Replaces ad-hoc offsets/entries arrays in build_accumulators

- Add prove_cubic_with_three_inputs_small_value combining small-value optimization for first ℓ₀ rounds with eq-poly optimization for remaining - Introduce SPARTAN_T_DEGREE constant to centralize polynomial degree parameter - Add sumcheck_sweep.rs examples for performance comparison

build_accumulators The new from_boolean_evals_with_buffer_reusing method takes caller-provided scratch buffers and alternates between them during extension. This reduces allocations from O(num_x_in × num_x_out) per call to O(num_threads) buffers allocated once per thread.

variants Spartan version (D=2) skips binary betas since satisfying witnesses have Az·Bz = Cz on {0,1}^n. Generic version supports arbitrary polynomial products.

Adds a new example that tests prove_cubic_with_three_inputs and prove_cubic_with_three_inputs_small_value produce identical proofs when used with a real SHA256 circuit (Algorithm 6 validation). Changes: - Add PartialEq, Eq derive to SumcheckProof for proof comparison - Add extract_outer_sumcheck_inputs helper to SpartanSNARK - Add examples/sumcheck_sha256_equivalence.rs

Implement the small × large multiplication optimization from "Speeding Up Sum-Check Proving" using Barrett reduction for ~3× speedup over naive field multiplication. Key changes: - Add SmallValueField trait for type-safe i32/i64 small-value operations - Implement Barrett reduction for Pallas Fp and Fq (sl_mul, isl_mul) - Add SpartanAccumulatorInput trait to unify field and i32 witness handling - Make LagrangeEvaluatedMultilinearPolynomial generic over element type - Update sumcheck prover to accept separate i32 witness polynomials - Clean up MultilinearPolynomial<i32>: remove unused from_u32/from_u64/from_field

evaluations Replace raw arrays and ad-hoc structs with proper abstractions for U_d = {∞, 0, 1, ..., D-1} and Û_d = U_d \ {1} evaluation domains. Remove EqRoundValues in favor of UdEvaluations<F, 2>.

- Delete unused constructor/predicate methods from UdPoint and UdHatPoint - Move test-only methods (alpha, prefix_len, suffix_len, extend_from_boolean) to cfg(test) impl blocks - Add CachedPrefixIndex struct with From impl to accumulator_index.rs - Remove unused QuadraticTAccumulatorPrefixIndex type alias - Delete unused eq_factor_alpha method from sumcheck

Hoist scratch buffers to thread-local state in build_accumulators_spartan and build_accumulators. Previously, 5 vectors were allocated on every x_out iteration; now allocations happen once per Rayon thread subdivision. - Add extend_in_place to LagrangeEvaluatedMultilinearPolynomial (avoids .to_vec()) - Add SpartanThreadState and GenericThreadState structs for buffer reuse - Extract thread state structs to thread_state_accumulators module Reduces allocations from O(num_x_out × num_x_in) to O(num_threads).

Move the witness polynomial abstraction trait from accumulators.rs to its own module for better code organization. Rename from SpartanAccumulatorInput to SpartanAccumulatorInputPolynomial to clarify that it abstracts over multilinear polynomial representations (field elements vs small values).

- compute_idx4: derive l0 from beta.len() instead of taking as parameter - csr: remove unused new() and push_empty(), move test helpers to #[cfg(test)] - accumulators: add #[inline] to num_prefixes() - examples: switch to tracing and #[instrument] for cleaner logging

- accumulator_index: add phase comments explaining prefix/suffix computation - accumulators: use filter() instead of continue for beta_has_infinity check - lagrange: document stride calculations in extend_in_place - small_field: extract try_field_to_small_impl to deduplicate Fp/Fq impls - small_field: document Barrett reduction loop bound (at most 2 iterations)

Copilot

Pull request overview

This PR implements Algorithm 6 ("Small-Value Sum-Check with Eq-Poly Optimization") from the paper "Speeding Up Sum-Check Proving" by Bagad, Dao, Domb, and Thaler. The optimization targets Spartan's first sum-check invocation where witness polynomial evaluations are small integers, achieving significant prover speedups (1.5-1.64×) by replacing expensive field multiplications with cheaper native integer operations.

Key changes:

Introduces Barrett-optimized field arithmetic for multiplying small integers with field elements
Implements Lagrange domain extension for efficient round polynomial computation
Adds accumulator data structures for precomputing sum-check values
Integrates the optimization into the existing sum-check protocol

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
src/small_field.rs	Barrett-optimized arithmetic trait for small-value × field-element operations
src/lagrange.rs	Lagrange domain types and multilinear polynomial extension logic
src/accumulators.rs	Accumulator data structures and Procedure 9 implementation
src/accumulator_index.rs	Index mapping for distributing evaluation prefixes to accumulators
src/sumcheck.rs	Integration of Algorithm 6 into the sum-check protocol
src/thread_state_accumulators.rs	Thread-local buffers to reduce allocations in parallel execution
src/spartan_accumulator_input_polynomial.rs	Trait abstraction for witness polynomials
src/polys/multilinear.rs	Generic multilinear polynomial type and prefix gathering
src/eq_linear.rs	Utilities for eq-polynomial round factors
src/csr.rs	Compressed sparse row storage for variable-length lists
examples/sumcheck_sweep.rs	Benchmark sweep across polynomial sizes
examples/sumcheck_sha256_equivalence.rs	Equivalence test with SHA-256 circuit

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-24T01:12:35Z

src/lib.rs

 //! We currently implement a non-preprocessing version of Spartan
 //! that is generic over the polynomial commitment and evaluation argument (i.e., a PCS).
 #![deny(
  warnings,


The unused lint has been removed from the deny list. This change allows unused code warnings to be suppressed, which could hide legitimate issues. Consider keeping unused in the deny list and using targeted #[allow(dead_code)] attributes where specific exceptions are needed.

Suggested change

warnings,

warnings,

unused,

src/lagrange_accumulator/thread_state.rs

Copilot · 2025-12-24T01:12:36Z

src/small_field.rs

+
+  #[inline]
+  fn small_from_u32(val: u32) -> i32 {
+    val as i32


The conversion from u32 to i32 using as casting is unsafe when the value exceeds i32::MAX. This could lead to incorrect negative values. Consider using checked conversion or documenting the assumption that val <= i32::MAX.

Suggested change

val as i32

i32::try_from(val).expect("small_from_u32: value does not fit in i32")

Copilot · 2025-12-24T01:12:36Z

src/small_field.rs

+      Self::from(val as u64)
+    } else {
+      -Self::from((-val) as u64)


The negation using (-val) as u64 is unsafe for i32::MIN (-2147483648) because -i32::MIN overflows. Use val.wrapping_neg() as u64 or val.unsigned_abs() instead to handle this edge case correctly.

Suggested change

Self::from(val as u64)

} else {

-Self::from((-val) as u64)

Self::from(val.unsigned_abs() as u64)

} else {

-Self::from(val.unsigned_abs() as u64)

Direct fixes: - Remove redundant closures and unnecessary casts - Replace manual_contains with .contains() - Replace manual_is_multiple_of with .is_multiple_of() - Replace useless vec! with array literal Suppressions (intentional patterns): - needless_range_loop: loop index serves dual purpose (indexing + computation) - identity_op/erasing_op: operations like `0 * base * base` document index formulas Typos: - Rename `ein` to `e_in_eval` for clarity (eq evaluation at input point)

Replace per-iteration modular reductions with accumulated wide-integer arithmetic, reducing once per beta instead of once per x_in iteration. Key changes: - Add WideLimbs<N> for wide unsigned integer arithmetic (6/8 limbs) - Refactor SmallValueField to be generic over small value type (i32/i64) - Add UnreducedMontInt types for delayed reduction in Montgomery form - Replace SpartanAccumulatorInputPolynomial with MatVecMLE trait - Optimize eq polynomial table computation (1 mul instead of 2 per element) - Update benchmark to compare i32/i64 vs i64/i128 variants

- Add mac() helper for fused multiply-accumulate, eliminating temporary arrays in unreduced_mont_int_mul_add (4 implementations) - Subtract in limb space before reduction via sub_mag(), saving one Barrett reduction per signed accumulator - Replace large e_out tables with JIT-computed eyx scratch buffers, reducing eq table memory 7× and improving cache locality - Add unreduced_is_zero() fast path to skip expensive modular reduction - Precompute betas_with_infty indices to avoid filter in inner loop - Use barrett_reduce_6_* directly for i128 products instead of padding to 8 limbs (saves 8 wasted multiplications per isl_mul call)

propagation Replace mac(acc, 0, 0, carry) calls with simple overflowing_add to avoid unnecessary u128 multiply-add pipeline for pure carry propagation. Also add #[inline(always)] to hot path functions to ensure full inlining.

- Apply rustfmt formatting fixes in accumulators.rs - Fix clippy manual_is_multiple_of warning in test code

Introduce circuit gadgets optimized for small-value sumcheck optimization: - SmallMultiEq: Batches equality constraints with bounded coefficients, flushing at MAX_COEFF_BITS (31) instead of bellpepper's ~237. This keeps constraint coefficients within i32 bounds for the small-value optimization. - SmallUInt32: 32-bit unsigned integer gadget using SmallMultiEq for carry constraints in addmany operations. - small_sha256: SHA-256 implementation using the above gadgets, producing circuits where Az and Bz values fit in i32. - Update sumcheck_sha256_equivalence example to use bellpepper's Circuit trait for constraint counting, comparing SmallSha256 vs bellpepper SHA-256. The tradeoff: SmallSha256 generates ~17% more R1CS constraints due to more frequent MultiEq flushing, but enables the small-value sumcheck optimization. Add 16-bit limbed addition for i32 small-value optimization SmallUInt32::addmany produces coefficients up to 2^34, exceeding i32 bounds. Splitting into 16-bit limbs reduces max coefficient to 2^18, enabling i32/i64 small-value sumcheck for SHA-256. - Add SmallValueConfig trait with Small32 (i32/i64) and Small64 (i64/i128) - Implement addmany_limbed using two constraints per addition - Update SmallMultiEq to be generic over config - Fix example to use config-specific bounds check

- Add examples/sha256_chain_benchmark.rs comparing original vs small-value sumcheck performance on SHA-256 hash chains - CSV output includes witness synthesis time, sumcheck times, speedup, and witness percentage of total proving time - CLI support: single <num_vars> for profiling, range-sweep for benchmarks - Add small_sha256_with_prefix() for chaining multiple SHA-256 hashes with unique constraint namespaces - Fix SmallValueField<i64> generic in lagrange.rs - Fix unused variable warning in msm.rs

microsoft-github-policy-service · 2026-01-09T04:23:27Z

@wu-s-john please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

(default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
(when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

Contribution License Agreement

This Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
and conveys certain license rights to Microsoft Corporation and its affiliates (“Microsoft”) for Your
contributions to Microsoft open source projects. This Agreement is effective as of the latest signature
date below.

Definitions.
“Code” means the computer software code, whether in human-readable or machine-executable form,
that is delivered by You to Microsoft under this Agreement.
“Project” means any of the projects owned or managed by Microsoft and offered under a license
approved by the Open Source Initiative (www.opensource.org).
“Submit” is the act of uploading, submitting, transmitting, or distributing code or other content to any
Project, including but not limited to communication on electronic mailing lists, source code control
systems, and issue tracking systems that are managed by, or on behalf of, the Project for the purpose of
discussing and improving that Project, but excluding communication that is conspicuously marked or
otherwise designated in writing by You as “Not a Submission.”
“Submission” means the Code and any other copyrightable material Submitted by You, including any
associated comments and documentation.
Your Submission. You must agree to the terms of this Agreement before making a Submission to any
Project. This Agreement covers any and all Submissions that You, now or in the future (except as
described in Section 4 below), Submit to any Project.
Originality of Work. You represent that each of Your Submissions is entirely Your original work.
Should You wish to Submit materials that are not Your original work, You may Submit them separately
to the Project if You (a) retain all copyright and license information that was in the materials as You
received them, (b) in the description accompanying Your Submission, include the phrase “Submission
containing materials of a third party:” followed by the names of the third party and any licenses or other
restrictions of which You are aware, and (c) follow any other instructions in the Project’s written
guidelines concerning Submissions.
Your Employer. References to “employer” in this Agreement include Your employer or anyone else
for whom You are acting in making Your Submission, e.g. as a contractor, vendor, or agent. If Your
Submission is made in the course of Your work for an employer or Your employer has intellectual
property rights in Your Submission by contract or applicable law, You must secure permission from Your
employer to make the Submission before signing this Agreement. In that case, the term “You” in this
Agreement will refer to You and the employer collectively. If You change employers in the future and
desire to Submit additional Submissions for the new employer, then You agree to sign a new Agreement
and secure permission from the new employer before Submitting those Submissions.
Licenses.

Copyright License. You grant Microsoft, and those who receive the Submission directly or
indirectly from Microsoft, a perpetual, worldwide, non-exclusive, royalty-free, irrevocable license in the
Submission to reproduce, prepare derivative works of, publicly display, publicly perform, and distribute
the Submission and such derivative works, and to sublicense any or all of the foregoing rights to third
parties.
Patent License. You grant Microsoft, and those who receive the Submission directly or
indirectly from Microsoft, a perpetual, worldwide, non-exclusive, royalty-free, irrevocable license under
Your patent claims that are necessarily infringed by the Submission or the combination of the
Submission with the Project to which it was Submitted to make, have made, use, offer to sell, sell and
import or otherwise dispose of the Submission alone or with the Project.
Other Rights Reserved. Each party reserves all rights not expressly granted in this Agreement.
No additional licenses or rights whatsoever (including, without limitation, any implied licenses) are
granted by implication, exhaustion, estoppel or otherwise.

Representations and Warranties. You represent that You are legally entitled to grant the above
licenses. You represent that each of Your Submissions is entirely Your original work (except as You may
have disclosed under Section 3). You represent that You have secured permission from Your employer to
make the Submission in cases where Your Submission is made in the course of Your work for Your
employer or Your employer has intellectual property rights in Your Submission by contract or applicable
law. If You are signing this Agreement on behalf of Your employer, You represent and warrant that You
have the necessary authority to bind the listed employer to the obligations contained in this Agreement.
You are not expected to provide support for Your Submission, unless You choose to do so. UNLESS
REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING, AND EXCEPT FOR THE WARRANTIES
EXPRESSLY STATED IN SECTIONS 3, 4, AND 6, THE SUBMISSION PROVIDED UNDER THIS AGREEMENT IS
PROVIDED WITHOUT WARRANTY OF ANY KIND, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTY OF
NONINFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE.
Notice to Microsoft. You agree to notify Microsoft in writing of any facts or circumstances of which
You later become aware that would make Your representations in this Agreement inaccurate in any
respect.
Information about Submissions. You agree that contributions to Projects and information about
contributions may be maintained indefinitely and disclosed publicly, including Your name and other
information that You submit with Your Submission.
Governing Law/Jurisdiction. This Agreement is governed by the laws of the State of Washington, and
the parties consent to exclusive jurisdiction and venue in the federal courts sitting in King County,
Washington, unless no federal subject matter jurisdiction exists, in which case the parties consent to
exclusive jurisdiction and venue in the Superior Court of King County, Washington. The parties waive all
defenses of lack of personal jurisdiction and forum non-conveniens.
Entire Agreement/Assignment. This Agreement is the entire agreement between the parties, and
supersedes any and all prior agreements, understandings or communications, written or oral, between
the parties relating to the subject matter hereof. This Agreement may be assigned by Microsoft.

Split SmallValueField into two traits for better separation of concerns: - SmallValueField: core small-value operations (ss_mul, sl_mul, isl_mul) - DelayedReduction: unreduced accumulator operations for hot paths Rename types for clarity: - UnreducedMontInt → UnreducedFieldInt (field × integer products) - UnreducedMontMont → UnreducedFieldField (field × field products) Add FieldReductionConstants trait to deduplicate Barrett/Montgomery reduction: - Consolidates Fp/Fq constants (MODULUS, R256-R512, MONT_INV) - Generic reduction functions monomorphized at compile time for zero overhead - Comprehensive documentation explaining R constants (2^k mod p) Performance and cleanup: - Add ext_buf_idx scratch buffer to avoid Vec allocation in accumulator hot loop - Remove unused OrderedVariable from shape_cs modules (~140 lines) - Remove unused build_univariate_round_evals from sumcheck (~40 lines) - Add log2_constraints column to benchmark CSV output

Split the 2,367-line small_field.rs into a proper module structure: - small_field/small_value_field.rs: SmallValueField trait - small_field/delayed_reduction.rs: DelayedReduction trait - small_field/barrett.rs: Barrett/Montgomery reduction functions - small_field/impls.rs: Fp/Fq implementations and tests - small_field/mod.rs: re-exports and helper functions Moved batching configuration types (NoBatching, Batching<K>, BatchingMode, SmallMultiEqConfig, I32NoBatch, I64Batch21) from small_field to gadgets/small_multi_eq.rs where they logically belong, since they're specifically for constraint batching in SmallMultiEq. Added detailed documentation for I64Batch21 explaining why K=21 is the safe maximum: with SHA-256-like circuits having ~200 terms and 2^34 positional coefficients, batching 21 constraints keeps the worst-case magnitude (2^62) under the i64 signed limit (2^63).

contributions Refactors shared logic between Spartan and generic accumulator builders.

Improves type safety and self-documentation by replacing (bool, [u64; N]) with an explicit enum indicating whether the result is positive (a >= b) or negative (a < b).

Move wide_limbs.rs content and limb arithmetic from barrett.rs into a unified small_field/limbs.rs module for delayed modular reduction.

Split monolithic lagrange.rs (1667 lines) into focused submodules: - domain.rs: LagrangePoint, LagrangeHatPoint, LagrangeIndex - evals.rs: LagrangeEvals, LagrangeHatEvals - basis.rs: LagrangeBasisFactory, LagrangeCoeff - extension.rs: LagrangeEvaluatedMultilinearPolynomial - accumulator.rs: RoundAccumulator, LagrangeAccumulators - accumulator_builder.rs: build_accumulators_spartan, build_accumulators Consolidate related files into the module: - accumulator_index.rs → index.rs - thread_state_accumulators.rs → thread_state.rs - eq_linear.rs → eq_round.rs Simplify extend_in_place API: use std::mem::swap to ensure result is always in first buffer, eliminating conditional buffer selection at call sites. Rename buf_a/b to buf_curr/scratch for clarity.

- Refactor SmallMultiEq from struct to trait with NoBatchEq and BatchingEq<K> implementations - Add addmany module with limbed (i32) and full (i64) addition algorithms - Deduplicate SHA-256 circuits into examples/circuits/sha256/ module - Update small_uint32 and small_sha256 to use SmallMultiEq trait

phase - Extend MatVecMLE trait with UnreducedFieldField type for F×F accumulation - Add unreduced bucket accumulators to SpartanThreadState - Replace eyx precomputation with direct e_y access and z_beta = ex * tA_red - Keep unreduced across all x_out iterations and merge without reduction - Pre-compute beta values to eliminate closure overhead in scatter loop - Final Montgomery reduction only once per bucket after thread merge This reduces Montgomery reductions from ~7000+ per x_out to ~26 total for typical parameters (l0=3, 128 x_outs).

savings Replace asymmetric l/2 split with balanced ceil/floor split. This reduces precomputation cost (e.g., 36→24 for l=10, l0=3), enables odd number of rounds, and improves cache utilization by making e_xout smaller.

wu-s-john added 9 commits December 18, 2025 14:52

Add domain types for Algorithm 6 sumcheck optimization

e426c34

Introduce UdPoint, UdHatPoint, UdTuple, and ValueOneExcluded types in src/lagrange.rs for representing evaluation domains U_d and Û_d used in the small-value sumcheck optimization.

Add Lagrange domain extension for multilinear polynomials

d0f8eed

Implements LagrangeEvaluatedMultilinearPolynomial with from_multilinear() factory method that extends evaluations from {0,1}^n to U_d^n.

Add index mapping for Algorithm 6 sumcheck optimization (Definition A.5)

d23b654

Implement AccumulatorPrefixIndex and compute_idx4() which maps evaluation prefixes β ∈ U_d^ℓ₀ to accumulator contributions by decomposing β into prefix v, coordinate u ∈ Û_d, and binary suffix y.

Add suffix eq-polynomial pyramid for Algorithm 6 sumcheck optimization

e4444fd

Add gather_prefix_evals and UdTuple::from_binary for Algorithm 6

b0908d7

Extracts strided polynomial evaluations for all binary prefixes b ∈ {0,1}^ℓ₀ given a fixed suffix, bridging full polynomials to Procedure 6 (Lagrange extension).

Add bit-ordering sanity tests for eq, gather, and binding

7b32a0c

Added explicit MSB-first checks for eq table generation, gather_prefix_evals stride/pattern, and bind_poly_var_top to ensure “top” binds the MSB.These tests catch silent index/order regressions across components.

wu-s-john changed the title ~~Implement Algorithm 6 Foundation — Procedure 9 Accumulator Builder~~ Implement Faster Sumcheck Algorithm — Procedure 9 Accumulator Builder Dec 18, 2025

wu-s-john added 13 commits December 19, 2025 15:05

Add Lagrange tensor evaluation helper and tests

006fb04

Add eq round factor helper and tests

6465006

Compute ℓ_i(X) = eqe(w[<i], r[<i]) · eqe(w_i, X) values for sum-check rounds. Compute ℓ_i(0)=α_i(1−w_i), ℓ_i(1)=α_i w_i, ℓ_i(∞)=α_i(2w_i−1) for sum-check rounds

Fix clippy warnings in accumulator and Lagrange loops

d9c75df

Replace range-indexed loops and a redundant closure with iterator forms

Implement Lagrange sum-check round helpers

cc373c1

Add eq-round linear factor utilities and accumulator evaluation to derive t_i and build s_i polynomials.

Add small-value sumcheck round harness and parity test

f8d5308

Track R_i and ℓ_i state to compare accumulator evals with EqSumCheckInstance rounds.

Improve small-value accumulators: use quadratic t_i and speed up

312bea3

indexing Switch Spartan t_i to D=2 aliases/tests, precompute idx4 prefix/suffix data, and flatten accumulator caches to cut allocations.

Add generic Csr<T> data structure and refactor accumulator cache

93ee149

Csr (Compressed Sparse Row) stores variable-length lists with 2 allocations instead of N+1, improving cache locality. Replaces ad-hoc offsets/entries arrays in build_accumulators

Add SmallValueSumCheck::from_accumulators factory method

1a81943

Split build_accumulators: add Spartan-optimized and generic Procedure 9

03b5293

variants Spartan version (D=2) skips binary betas since satisfying witnesses have Az·Bz = Cz on {0,1}^n. Generic version supports arbitrary polynomial products.

wu-s-john force-pushed the feat/procedure-9-accumulator branch from 2828f04 to 67674c4 Compare December 23, 2025 19:33

wu-s-john added 4 commits December 23, 2025 13:29

Add type-safe UdEvaluations and UdHatEvaluations wrappers for domain

f6af2b4

evaluations Replace raw arrays and ad-hoc structs with proper abstractions for U_d = {∞, 0, 1, ..., D-1} and Û_d = U_d \ {1} evaluation domains. Remove EqRoundValues in favor of UdEvaluations<F, 2>.

wu-s-john changed the title ~~Implement Faster Sumcheck Algorithm — Procedure 9 Accumulator Builder~~ Implement Small-Value Sum-Check Optimization (Algorithm 6) Dec 23, 2025

wu-s-john marked this pull request as ready for review December 23, 2025 23:56

wu-s-john added 2 commits December 23, 2025 16:24

srinathsetty requested a review from Copilot December 24, 2025 01:10

Copilot started reviewing on behalf of srinathsetty December 24, 2025 01:11 View session

Copilot AI reviewed Dec 24, 2025

View reviewed changes

wu-s-john added 10 commits January 6, 2026 15:06

Fix formatting issues

44b442c

Refactor sumcheck_sweep to use clap CLI instead of env vars

bd0d913

Fix formatting and clippy warnings

e434257

- Apply rustfmt formatting fixes in accumulators.rs - Fix clippy manual_is_multiple_of warning in test code

Added clap cli command

88ba34b

wu-s-john added 3 commits January 8, 2026 21:12

Improve accumulator helpers: dedupe eq-table prep and scatter beta

406b59e

contributions Refactors shared logic between Spartan and generic accumulator builders.

wu-s-john force-pushed the feat/procedure-9-accumulator branch from 878e7b0 to 406b59e Compare January 9, 2026 06:00

wu-s-john added 7 commits January 8, 2026 22:33

Refactor sub_mag to return SubMagResult enum instead of tuple

997556f

Improves type safety and self-documentation by replacing (bool, [u64; N]) with an explicit enum indicating whether the result is positive (a >= b) or negative (a < b).

Consolidate limb operations into small_field/limbs module

c90062d

Move wide_limbs.rs content and limb arithmetic from barrett.rs into a unified small_field/limbs.rs module for delayed modular reduction.

Fix clippy lints in lagrange_accumulator: add Default and is_empty impls

4b85d84

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement Small-Value Sum-Check Optimization (Algorithm 6) #98

Implement Small-Value Sum-Check Optimization (Algorithm 6) #98

Uh oh!

wu-s-john commented Dec 18, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

Copilot AI Dec 24, 2025

Uh oh!

microsoft-github-policy-service bot commented Jan 9, 2026

Contribution License Agreement

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	val as i32
	i32::try_from(val).expect("small_from_u32: value does not fit in i32")

Implement Small-Value Sum-Check Optimization (Algorithm 6) #98

Are you sure you want to change the base?

Implement Small-Value Sum-Check Optimization (Algorithm 6) #98

Uh oh!

Conversation

wu-s-john commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implement Small-Value Sum-Check Optimization (Algorithm 6)

Summary

Key Insight

Benchmarks

Delayed Modular Reduction (i32 vs i64)

SHA-256 Chain Benchmark

Solana Light Client Comparison

Implementation

Core Components

Algorithm Flow

Test Plan

References

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

Uh oh!

microsoft-github-policy-service bot commented Jan 9, 2026

Contribution License Agreement

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wu-s-john commented Dec 18, 2025 •

edited

Loading