Skip to content

Conversation

@andrewkern
Copy link
Collaborator

@andrewkern andrewkern commented Dec 14, 2025

Integrates SLEEF (SIMD Library for Evaluating Elementary Functions) to accelerate exp(), log(), log10(), and log2() operations on float vectors.

Changes:

  • Add vendored SLEEF inline headers for AVX2 (x86_64) and NEON (ARM64)
  • Patch hex float literals for C++11 compatibility
  • Add Windows/MinGW SIMD detection to CMakeLists.txt
  • Add cross-compilation toolchain for MinGW-w64

Documentation:

  • simd_benchmarks/SIMD_BUILD_FLAGS.md - build flag interaction
  • eidos/sleef/SLEEF_HEADER_GENERATION.md - header regeneration instructions
  • eidos/sleef/generate_avx2_sleef.sh and generate_arm_sleef.sh - generation scripts

Integrate SLEEF (SIMD Library for Evaluating Elementary Functions) to
provide vectorized transcendental math functions. SLEEF enables 4-wide
AVX2 vectorization for exp(), log(), log10(), and log2() operations.

Changes:
- Add eidos/sleef/ directory with vendored SLEEF inline headers
- Patch SLEEF headers to use decimal floats (C++11 compatibility)
- Update eidos_simd.h to use SLEEF when AVX2+FMA is available
- Keep existing hand-written SIMD for sqrt, abs, floor, ceil, etc.

Architecture support:
- AVX2+FMA: 4-wide vectorized transcendentals via SLEEF
- ARM NEON: Placeholder for future (scalar fallback for now)
- SSE4.2-only: Scalar std::exp/log fallback

SLEEF is distributed under the Boost Software License.
Generate sleefinline_advsimd.h from SLEEF 4.0.0 on ARM64 macOS and
enable ARM NEON support in sleef_config.h.

This provides 2-wide vectorized transcendental functions (exp, log,
log10, log2) on Apple Silicon and other ARM64 platforms.
- Update eidos_functions_math.cpp to call SIMD functions when OpenMP disabled
- Update SpatialMap::exp() to use SIMD for consistent results
- Add command-line override support (-DEIDOS_SLEEF_AVAILABLE=0) for testing
- Add SLEEF benchmark script and ARM header generation script

Performance improvement on x86_64 AVX2 (1M elements):
- exp():    8.30ms -> 4.05ms (2.1x speedup)
- log():    6.17ms -> 3.37ms (1.8x speedup)
- log10(): 10.79ms -> 3.66ms (2.9x speedup)
- log2():   5.81ms -> 3.99ms (1.5x speedup)
- Add AVX2/FMA detection for Windows/MinGW builds in CMakeLists.txt
- Create cmake/toolchain-mingw64.cmake for cross-compilation testing
- Verified SLEEF compiles and runs correctly on Windows via Wine

This enables the same SLEEF-powered exp/log/log10/log2 speedups on
Windows that we have on Linux and macOS.
Consolidates all SIMD-related documentation and scripts in one location.
Updates SLEEF_HEADER_GENERATION.md with corrected path references.
Adds AVX2 header generation script matching the ARM script style.
Updates SLEEF_HEADER_GENERATION.md to reference the script file.
SLEEF and std::exp produce slightly different results at ULP level.
When spatial maps reorder data internally, different elements end up
in the scalar remainder loop vs the SIMD loop, causing identical()
to fail even though both results are numerically correct.
SLEEF headers generated on Linux/GCC unconditionally define
SLEEF_FLOAT128_IS_IEEEQP, but __float128 is not supported by
Clang/AppleClang. This caused build failures on macos-15-intel.

The fix conditionally defines SLEEF_FLOAT128_IS_IEEEQP only when
the compiler actually supports __float128 (GCC with __SIZEOF_FLOAT128__).
On other compilers, SLEEF falls back to a struct-based Sleef_quad type.

Also updates the generation script and documentation.
@andrewkern
Copy link
Collaborator Author

one question for you @bhaller, when you get a chance to review this-- are you happy with where the documentation currently lives in simd_benchmarks/?

Another option might be to put simd_benchmarks/SLEEF_HEADER_GENERATION.md and the header patching scripts simd_benchmarks/generate_*_sleef.sh in the same directory as the header files themselves eidos/sleef/

I didn't do this originally to keep it clean, but I'd like this to be as clear as possible to anyone working on this stuff in the future.

@bhaller
Copy link
Contributor

bhaller commented Dec 15, 2025

one question for you @bhaller, when you get a chance to review this-- are you happy with where the documentation currently lives in simd_benchmarks/?

Another option might be to put simd_benchmarks/SLEEF_HEADER_GENERATION.md and the header patching scripts simd_benchmarks/generate_*_sleef.sh in the same directory as the header files themselves eidos/sleef/

I didn't do this originally to keep it clean, but I'd like this to be as clear as possible to anyone working on this stuff in the future.

I would move them as you suggest, yes. Keep the sleef stuff in eidos/sleef, and keep the simd_benchmarks folder as SIMD benchmarks (whether related to sleef or not). That seems like a clean conceptual division.

@bhaller
Copy link
Contributor

bhaller commented Dec 15, 2025

OK, I did a quick review. AFAICT this is good to merge as soon as that move, and other minutiae we discussed on Slack, has been done. Ping me when it's ready. Thanks, this is amazing!

@bhaller bhaller merged commit 0094aac into MesserLab:master Dec 15, 2025
17 checks passed
@andrewkern andrewkern deleted the sleef branch December 15, 2025 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants