fix: disable bf16 WMMA kernels on pre-Ampere GPUs #3367

asglover · 2026-02-12T01:55:53Z

Summary

Add #ifndef NO_BF16_KERNEL guards to moe_wmma.cu — moe_wmma_gguf.cu already had these guards but moe_wmma.cu was missing them, causing compilation failures on pre-Ampere GPUs
Wire up -DNO_BF16_KERNEL in build.rs — detect compute capability via cudaforge::detect_compute_cap() and pass the define when compute cap < 80

Problem

bf16 WMMA fragment types (nv_bfloat16 with nvcuda::wmma) require compute capability >= 8.0 (Ampere). On sm_75 (Turing/T4) and older GPUs, compiling these fragments produces "incomplete type" errors:

error: incomplete type "nvcuda::wmma::fragment<nvcuda::wmma::matrix_a, 16, 16, 16, nv_bfloat16, nvcuda::wmma::row_major>" is not allowed

The NO_BF16_KERNEL preprocessor guard was already present in moe_wmma_gguf.cu but:

It was never actually passed by the build script
moe_wmma.cu was missing the guard entirely

Test plan

Build candle-kernels on a T4 (sm_75) — compiles successfully with bf16 WMMA kernels excluded
Build on an A100/A10 (sm_80+) — bf16 WMMA kernels should still be compiled as before

bf16 WMMA fragment types (nv_bfloat16 with nvcuda::wmma) are only supported on sm_80+ (Ampere and later). On older architectures like sm_75 (Turing/T4), compiling these fragments produces "incomplete type" errors. moe_wmma_gguf.cu already had #ifndef NO_BF16_KERNEL guards but moe_wmma.cu was missing them, and the build script never passed the -DNO_BF16_KERNEL define. This commit: - Adds matching #ifndef NO_BF16_KERNEL guards to moe_wmma.cu - Updates build.rs to detect compute capability via cudaforge and pass -DNO_BF16_KERNEL when building for GPUs with compute cap < 80

asglover · 2026-02-12T01:58:10Z

This is a claude generated PR, but it does fix the lack of guards on the new MOE kernels. I'm happy to pull it or modify it to make it mergable and up to your standards. It have tested that it allows for candle-kernels to be built on GitHub runners.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: disable bf16 WMMA kernels on pre-Ampere GPUs #3367

fix: disable bf16 WMMA kernels on pre-Ampere GPUs #3367

asglover commented Feb 12, 2026

Uh oh!

asglover commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: disable bf16 WMMA kernels on pre-Ampere GPUs #3367

Are you sure you want to change the base?

fix: disable bf16 WMMA kernels on pre-Ampere GPUs #3367

Conversation

asglover commented Feb 12, 2026

Summary

Problem

Test plan

Uh oh!

asglover commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant