You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[integer] Optimize BigUInt addition and subtraction with SIMD and early stop tricks (#101)
This pull request introduces significant updates to the `bench_biguint`
benchmarking suite, optimizes arithmetic operations in the `BigUInt` and
`BigDecimal` modules, and refactors method names for clarity and
consistency. Additionally, new benchmarking cases and constants are
added to improve performance testing and support for larger numbers.
1. Use SIMD to accelerate BigUInt addition and in-place addition. The
speed gain is 2x to 4x for large numbers.
2. Refine the BigUInt subtraction and in-place addition with some tricks
on carry so that floor_divide and modulo are replaced by addition and
subtraction.
3. Use a trick to first do a paralelled addition word-by-word, and then
do normalized carries with one loop.
### Arithmetic Optimizations:
*
[`src/decimojo/bigdecimal/arithmetics.mojo`](diffhunk://#diff-f79534f4e7fdd891932ce9d015c50bd3c8a72c4a1689f0cb55524490ffc0458dL73-R74):
Refactored methods to replace `scale_up_by_power_of_10` and
`scale_down_by_power_of_10` with `multiply_by_power_of_ten` and
`floor_divide_by_power_of_ten`, improving naming consistency and
clarity.
[[1]](diffhunk://#diff-f79534f4e7fdd891932ce9d015c50bd3c8a72c4a1689f0cb55524490ffc0458dL73-R74)
[[2]](diffhunk://#diff-f79534f4e7fdd891932ce9d015c50bd3c8a72c4a1689f0cb55524490ffc0458dL304-R304)
[[3]](diffhunk://#diff-f79534f4e7fdd891932ce9d015c50bd3c8a72c4a1689f0cb55524490ffc0458dL440-R445)
*
[`src/decimojo/biguint/biguint.mojo`](diffhunk://#diff-f9432b9b2671643af91201f9e3f011551a3d3b0e6d7b256d0d4569f5ae59848aL1003-L1010):
Removed redundant `add_inplace_by_1` method and replaced it with a more
general `add_inplace_by_uint32` for optimized addition operations.
[[1]](diffhunk://#diff-f9432b9b2671643af91201f9e3f011551a3d3b0e6d7b256d0d4569f5ae59848aL1003-L1010)
[[2]](diffhunk://#diff-f9432b9b2671643af91201f9e3f011551a3d3b0e6d7b256d0d4569f5ae59848aL1437-R1439)
### Refactoring and Enhancements:
*
[`src/decimojo/biguint/biguint.mojo`](diffhunk://#diff-f9432b9b2671643af91201f9e3f011551a3d3b0e6d7b256d0d4569f5ae59848aR74-R75):
Renamed methods (e.g., `scale_up_by_power_of_10` →
`multiply_by_power_of_ten`) for consistency across the codebase and
introduced `VECTOR_WIDTH` constant for SIMD-based arithmetic
optimizations.
[[1]](diffhunk://#diff-f9432b9b2671643af91201f9e3f011551a3d3b0e6d7b256d0d4569f5ae59848aR74-R75)
[[2]](diffhunk://#diff-f9432b9b2671643af91201f9e3f011551a3d3b0e6d7b256d0d4569f5ae59848aL1070-R1095)
*
[`src/decimojo/bigdecimal/comparison.mojo`](diffhunk://#diff-04237ffa697ff22a4879812f65a72c23bc5d3e183b58f11e437c94836bd43da3L66-R70):
Updated comparison logic to use the newly renamed
`multiply_by_power_of_ten` method for scaling coefficients.
### Benchmarking Updates:
*
[`benches/biguint/bench_biguint_add.mojo`](diffhunk://#diff-967ad165864a3f276ee27b8eca0721f132d904f71ffb3da60003a75aec8837efR460-R509):
Added five new addition benchmark cases for larger word sizes (e.g.,
4096 words + 2048 words) to test scalability.
*
[`benches/biguint/bench_biguint_multiply.mojo`](diffhunk://#diff-3fba3fe441d30e17e77d7e18b33b2508452b08f07e7af177d413c08b5b5c88c2L463-R584):
Expanded multiplication benchmarks to include 12 new cases with varying
word sizes, introducing reduced iterations for very large numbers to
optimize runtime.
*
[`benches/biguint/bench_biguint_multiply_complexity.mojo`](diffhunk://#diff-d0d1723b5108046f6dc332ce4cf856979f00576061017e71660d38bcd536b31fL132-R132):
Adjusted test sizes to start from 8 words instead of 32 and updated
iteration logic for benchmarking complexity.
[[1]](diffhunk://#diff-d0d1723b5108046f6dc332ce4cf856979f00576061017e71660d38bcd536b31fL132-R132)
[[2]](diffhunk://#diff-d0d1723b5108046f6dc332ce4cf856979f00576061017e71660d38bcd536b31fL141-R144)
These changes collectively enhance the code's readability, scalability,
and performance, especially for operations involving large numbers and
benchmarking scenarios.
0 commit comments