-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add support for x86 SIMD (SSE, AVX2, AVX512) #3019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| #if !defined(MLX_USE_ACCELERATE) | ||
| #if defined(__AVX512F__) | ||
| #include "mlx/backend/cpu/simd/avx512_simd.h" | ||
| #elif defined(__AVX2__) | ||
| #include "mlx/backend/cpu/simd/avx_simd.h" | ||
| #elif defined(__SSE4_2__) | ||
| #include "mlx/backend/cpu/simd/sse_simd.h" | ||
| #endif | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if this will break our linux x86 distribution in some cases. If we build with avx512 then someone tries to run it on a machine which doesn't support avx512 it will crash right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it looks like just the lowest level is enabled by default. So we should be ok.
|
@dhiltgen what are you thinking for next steps here? I might suggest we split this out into multiple PRs to make it easier to review and incorporate. The first PR could be the basic SSE backend for X86 which we should definitely integrate. Following that we could add the extra back-ends (there is a question of how to tests those as well). We will probably also want a neon-only back-end for linux ARM (i.e. no through accelerate). |
|
Splitting up to smaller chunks sounds like a reasonable approach. I'll probably keep this in draft for a bit, while we focus on full GPU load for best performance. |
|
Sounds good! |
Proposed changes
This implements CPU support for SSE, AVX2, and AVX512.
Performance benchmarks on Linux on Xeon Silver 4410T
Baseline
build/cpu/benchmarks/cpp/autograd
build/cpu/benchmarks/cpp/irregular_strides
build/cpu/benchmarks/cpp/single_ops
SSE
build/cpu-sse/benchmarks/cpp/autograd
build/cpu-sse/benchmarks/cpp/irregular_strides
build/cpu-sse/benchmarks/cpp/single_ops
AVX2
build/cpu-avx2/benchmarks/cpp/autograd
build/cpu-avx2/benchmarks/cpp/irregular_strides
build/cpu-avx2/benchmarks/cpp/single_ops
AVX512
build/cpu-avx512/benchmarks/cpp/autograd
build/cpu-avx512/benchmarks/cpp/irregular_strides
build/cpu-avx512/benchmarks/cpp/single_ops
Checklist
Put an
xin the boxes that apply.pre-commit run --all-filesto format my code / installed pre-commit prior to committing changes