perf: binary splitting for NumericMatrix.walk()#156
Merged
Conversation
Replace the sequential O(N^2) left-accumulate loop with a balanced divide-and-conquer product tree, reducing complexity to O(N log N). The speedup grows with N and is largest for rational-entry matrices where numerator/denominator sizes grow fast: 2x2 rational: ~15x at N=1000, ~40x at N=2000, ~160x at N=5000 3x3 rational: ~16x at N=1000, ~46x at N=2000, ~213x at N=5000 5x5 rational: ~31x at N=1000, ~80x at N=2000 2x2 PCF: ~2-3x at N=1000-10000 5x5 polynomial: ~2-6x at N=1000-5000 API and return values are unchanged. All 334 existing tests pass.
RotemKalisch
requested changes
Apr 4, 2026
- Removed separate numeric_matrix_benchmark.py - Added standalone comparison script to existing matrix_benchmark.py - All benchmarks now test through the public Matrix.walk() API - Verified similar speedups through the Matrix wrapper: * 2x2 PCF: 2-4x * 2x2 rational: 13-148x (N=1000-5000) * 3x3 rational: 14-170x (N=1000-5000) * 5x5 rational: 18x at N=1000 * 5x5 polynomial: 3-5x
RotemKalisch
requested changes
Apr 4, 2026
…mark - Renamed lo/hi to first/last for clarity - Extracted magic number 8 into SEQUENTIAL_THRESHOLD constant - Added docstring explaining why balanced splitting matters for FLINT rationals - No suitable pythonic utility exists for balanced product trees: math.prod / functools.reduce are sequential left-folds that lose the binary-splitting benefit; python-flint and mpmath have no generic matrix product tree - Removed _compare(), _walk_sequential(), and __main__ block from matrix_benchmark.py — benchmarks are now strictly pytest-only
44f8d87 to
98225d9
Compare
RotemKalisch
requested changes
Apr 4, 2026
kaminer
added a commit
that referenced
this pull request
Apr 4, 2026
- Add style rule: no decorative section separators or redundant comments - Add testing rule: use @pytest.mark.parametrize for variations - Both learned from RotemKalisch's review feedback on PR #156
- Removed all decorative # --- section separator comments - Replaced 5 separate test functions with a single parametrized test: @pytest.mark.parametrize over MATRICES x DEPTHS (100, 500, 1000, 2000) - Now 20 walk benchmark combinations + 1 as_companion = 21 tests
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replace the sequential O(N^2) left-accumulate loop with a balanced divide-and-conquer product tree, reducing complexity to O(N log N).
The speedup grows with N and is largest for rational-entry matrices where numerator/denominator sizes grow fast:
2x2 rational: ~15x at N=1000, ~40x at N=2000, ~160x at N=5000
3x3 rational: ~16x at N=1000, ~46x at N=2000, ~213x at N=5000
5x5 rational: ~31x at N=1000, ~80x at N=2000
2x2 PCF: ~2-3x at N=1000-10000
5x5 polynomial: ~2-6x at N=1000-5000
API and return values are unchanged. All 334 existing tests pass.