SIMD acceleration for spatial interaction kernels #590

andrewkern · 2025-12-17T04:40:34Z

Adds SIMD-optimized (AVX2/NEON) kernel functions for spatial interactions using SLEEF for transcendental functions. All kernel types except Fixed now use a two-pass approach (build distances, then batch transform) enabling vectorized strength calculations.

Summary Benchmark

50k individuals, ~2262 neighbors

Kernel	Original	Final	Improvement
Fixed	31.97s	31.36s	-2% (special-cased)
Linear	37.26s	32.95s	-12%
Exponential	59.58s	34.88s	-41%
Normal	56.37s	35.15s	-38%
Cauchy	40.04s	33.00s	-18%
Student's T	130.10s	49.76s	-62%
TOTAL	356.04s	217.80s	-39%

…ential) - Add float SLEEF macros to sleef_config.h (AVX2: 8 floats, NEON: 4 floats) - Add exp_kernel_float32() and normal_kernel_float32() to eidos_simd.h - Use SIMD kernels in FillSparseVectorForReceiverStrengths() - Add benchmark script for spatial interaction kernels

Modify FillSparseVectorForReceiverStrengths() to use the two-pass distance-then-transform path for most kernel types in 2D, enabling SIMD optimizations for Exponential and Normal kernels. The Fixed kernel retains the original single-pass special-case path since it doesn't benefit from SIMD (just assigns a constant). Benchmarks show 22% overall speedup at high neighbor counts (~2200), with Exponential and Normal kernels seeing 38-42% improvement.

Add SLEEF pow() function support (Sleef_powf8_u10avx2 for AVX2, Sleef_powf4_u10advsimd for NEON) and implement tdist_kernel_float32() to vectorize the Student's T distribution kernel calculation. The kernel computes: strength = fmax / pow(1 + (d/tau)^2 / nu, (nu+1)/2) Benchmarks show 62% speedup for Student's T kernel (130s -> 49s), contributing to 38% overall speedup for spatial interaction benchmarks.

Implement cauchy_kernel_float32() using AVX2/NEON intrinsics for the Cauchy kernel calculation: strength = fmax / (1 + (d/lambda)^2) Unlike exp/normal/tdist kernels, Cauchy uses only basic arithmetic operations (multiply, divide, add) so no SLEEF functions are needed. Benchmarks show 18% speedup vs original (40s -> 33s).

Implement linear_kernel_float32() using AVX2/NEON intrinsics for the Linear kernel calculation: strength = fmax * (1 - d / max_distance) Rewritten as: strength = fmax - d * (fmax / max_distance) Simple arithmetic (multiply + subtract) so gains are modest (~2%), but provides consistency with other SIMD-optimized kernels.

Remove sleef_benchmark_spatial_interaction.slim (only tested Gaussian/Exponential) and add benchmark_all_kernels.slim which tests all 6 kernel types: Fixed, Linear, Exponential, Normal, Cauchy, and Student's T.

Add documentation for benchmark_all_kernels.slim script including: - Entry in Contents section describing the 6 kernel types tested - Performance results table showing SIMD speedups on AVX2 - Usage instructions for running with SIMD vs scalar builds - Notes on adjusting neighbor density via W parameter 🤖 Generated with [Claude Code](https://claude.com/claude-code)

bhaller · 2025-12-17T05:07:05Z

This looks good to me. Awesome performance improvements. You are knocking it out of the park!

andrewkern added 7 commits December 16, 2025 11:35

Replace benchmark with comprehensive all-kernels version

e4962d7

Remove sleef_benchmark_spatial_interaction.slim (only tested Gaussian/Exponential) and add benchmark_all_kernels.slim which tests all 6 kernel types: Fixed, Linear, Exponential, Normal, Cauchy, and Student's T.

andrewkern mentioned this pull request Dec 17, 2025

optimizing spatial interaction kernel calculation -- during kd-tree traversal or no? #589

Closed

bhaller merged commit 144b04a into MesserLab:master Dec 17, 2025
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SIMD acceleration for spatial interaction kernels #590

SIMD acceleration for spatial interaction kernels #590

Uh oh!

andrewkern commented Dec 17, 2025

Uh oh!

bhaller commented Dec 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SIMD acceleration for spatial interaction kernels #590

SIMD acceleration for spatial interaction kernels #590

Uh oh!

Conversation

andrewkern commented Dec 17, 2025

Summary Benchmark

Uh oh!

bhaller commented Dec 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants