-
Notifications
You must be signed in to change notification settings - Fork 36
Sleef #587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sleef #587
Conversation
Integrate SLEEF (SIMD Library for Evaluating Elementary Functions) to provide vectorized transcendental math functions. SLEEF enables 4-wide AVX2 vectorization for exp(), log(), log10(), and log2() operations. Changes: - Add eidos/sleef/ directory with vendored SLEEF inline headers - Patch SLEEF headers to use decimal floats (C++11 compatibility) - Update eidos_simd.h to use SLEEF when AVX2+FMA is available - Keep existing hand-written SIMD for sqrt, abs, floor, ceil, etc. Architecture support: - AVX2+FMA: 4-wide vectorized transcendentals via SLEEF - ARM NEON: Placeholder for future (scalar fallback for now) - SSE4.2-only: Scalar std::exp/log fallback SLEEF is distributed under the Boost Software License.
Generate sleefinline_advsimd.h from SLEEF 4.0.0 on ARM64 macOS and enable ARM NEON support in sleef_config.h. This provides 2-wide vectorized transcendental functions (exp, log, log10, log2) on Apple Silicon and other ARM64 platforms.
- Update eidos_functions_math.cpp to call SIMD functions when OpenMP disabled - Update SpatialMap::exp() to use SIMD for consistent results - Add command-line override support (-DEIDOS_SLEEF_AVAILABLE=0) for testing - Add SLEEF benchmark script and ARM header generation script Performance improvement on x86_64 AVX2 (1M elements): - exp(): 8.30ms -> 4.05ms (2.1x speedup) - log(): 6.17ms -> 3.37ms (1.8x speedup) - log10(): 10.79ms -> 3.66ms (2.9x speedup) - log2(): 5.81ms -> 3.99ms (1.5x speedup)
- Add AVX2/FMA detection for Windows/MinGW builds in CMakeLists.txt - Create cmake/toolchain-mingw64.cmake for cross-compilation testing - Verified SLEEF compiles and runs correctly on Windows via Wine This enables the same SLEEF-powered exp/log/log10/log2 speedups on Windows that we have on Linux and macOS.
Consolidates all SIMD-related documentation and scripts in one location. Updates SLEEF_HEADER_GENERATION.md with corrected path references.
Adds AVX2 header generation script matching the ARM script style. Updates SLEEF_HEADER_GENERATION.md to reference the script file.
SLEEF and std::exp produce slightly different results at ULP level. When spatial maps reorder data internally, different elements end up in the scalar remainder loop vs the SIMD loop, causing identical() to fail even though both results are numerically correct.
SLEEF headers generated on Linux/GCC unconditionally define SLEEF_FLOAT128_IS_IEEEQP, but __float128 is not supported by Clang/AppleClang. This caused build failures on macos-15-intel. The fix conditionally defines SLEEF_FLOAT128_IS_IEEEQP only when the compiler actually supports __float128 (GCC with __SIZEOF_FLOAT128__). On other compilers, SLEEF falls back to a struct-based Sleef_quad type. Also updates the generation script and documentation.
|
one question for you @bhaller, when you get a chance to review this-- are you happy with where the documentation currently lives in Another option might be to put I didn't do this originally to keep it clean, but I'd like this to be as clear as possible to anyone working on this stuff in the future. |
I would move them as you suggest, yes. Keep the sleef stuff in |
|
OK, I did a quick review. AFAICT this is good to merge as soon as that move, and other minutiae we discussed on Slack, has been done. Ping me when it's ready. Thanks, this is amazing! |
Integrates SLEEF (SIMD Library for Evaluating Elementary Functions) to accelerate exp(), log(), log10(), and log2() operations on float vectors.
Changes:
Documentation: