-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Fix double precision for exp, sin, cos, erf, erfinv on CPU #3058
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
These functions were using float32 polynomial approximations for all inputs, causing precision loss for float64. Added proper double precision paths using element-wise std:: calls for exp/sin/cos/erf, and Julia's SpecialFunctions.jl rational approximation for erfinv.
| if constexpr (N == 1) { | ||
| return Simd<T, 1>{std::exp(in.value)}; | ||
| } else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the check for N=1 here necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now Simd<T, 1> (in base_simd.h) doesn't have operator[] - only the general Simd<T, N> in accelerate_simd.h does. So we need the N=1 check to use .value instead of [i]... I think adding operator there could remove this check so that's probably cleaner 🙏
|
The purpose of the issue (#3047), perhaps implicitly, was to add actual vectorized implementations rather than unrolled element-wise implementations. Are you interested in making that update? So basically we'd need to find good approximations for exp, sin/cos, erf and use simd functions to implement them. And similarly for erfinv it should be translated into a simd version. |
|
Hey @awni, so my original approach was using Cephes polynomials, but I ran the benchmark and it was slower for everything except exp (where it was only ~4% faster). For sin and cos it was about 50% slower. But I'm happy to change the implementation if a vectorized implementation is preferred 👍 Perhaps there are reasons I'm not thinking of as well? |
That's interesting and unexpected. What machine were you running benchmarks on? |

Proposed changes
Fixes #3047 - These functions were using float32 polynomial approximations for all inputs, causing precision loss for float64. Added proper double precision paths using element-wise
std::calls for exp/sin/cos/erf, and Julia's SpecialFunctions.jl rational approximation for erfinv.Questions for @awni:
erfinv, I used coefficients from Julia's SpecialFunctions.jl (Blair et al. 1976, 3-region rational approximation, ~1e-13 relative error). Is this acceptable?--cpuflag to the benchmark. Is it useful to include this or rather remove these changes?Checklist
pre-commit run --all-filesto format my code / installed pre-commit prior to committing changes