fix: disable bf16 WMMA kernels on pre-Ampere GPUs #3367
+13
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #3366
#ifndef NO_BF16_KERNELguards tomoe_wmma.cu—moe_wmma_gguf.cualready had these guards butmoe_wmma.cuwas missing them, causing compilation failures on pre-Ampere GPUs-DNO_BF16_KERNELinbuild.rs— detect compute capability viacudaforge::detect_compute_cap()and pass the define when compute cap < 80Problem
bf16 WMMA fragment types (
nv_bfloat16withnvcuda::wmma) require compute capability >= 8.0 (Ampere). On sm_75 (Turing/T4) and older GPUs, compiling these fragments produces "incomplete type" errors:The
NO_BF16_KERNELpreprocessor guard was already present inmoe_wmma_gguf.cubut:moe_wmma.cuwas missing the guard entirelyTest plan