-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
fma_poly_scale_slice_hexl performance times does not increase linearly with mod_size (ie moduli count in poly). Instead as mod_size increases time blows up.
Poly stores its coefficients in row major form and all fma_poly_scale_slice_hexl does is that it calls hexl_rs::elwise_fma_mod for each row in poly. This means fma_poly_scale_slice_hexl calls elwise_fma_mod mod_size times (row count equals mod_size, since there is a single row for each moduli). Hence, time taken when mod_size = 15 must be 15x of time taken when mod_size = 1. But this isn't the case.
For example on r6i.8xlarge instance, following are benchmarks for fma_poly_scale_slice_hexl:
range_fn/fma_poly_scale_slice_hexl/n=32768/mod_size=1
time: [8.5271 µs 8.5275 µs 8.5278 µs]
range_fn/fma_poly_scale_slice_hexl/n=32768/mod_size=3
time: [36.679 µs 36.695 µs 36.709 µs]
range_fn/fma_poly_scale_slice_hexl/n=32768/mod_size=7
time: [107.51 µs 107.51 µs 107.52 µs]
range_fn/fma_poly_scale_slice_hexl/n=32768/mod_size=15
time: [254.32 µs 254.35 µs 254.37 µs]
Time taken clearly does not increase linearly with mod_size.
Metadata
Metadata
Assignees
Labels
No labels