Enable support for 16 or 32 heads in the SM90 Sparse Attention #11

princepride · 2025-12-29T09:48:34Z

Purposal:

The SM90 kernel now automatically selects the appropriate block size based on h_q:
h_q = 64, 128, 192, ... → Use B_H=64
h_q = 32, 96, 160, ... → Use B_H=32
h_q = 16, 48, 80, ... → Use B_H=16
This allows for more flexible support of different head configurations while maintaining optimal performance (prioritizing larger block sizes).

Test Plan:

@LucasWilkinson This code was generated by Gemini 3.0 Pro. I don't know much about kernel. Is this code modification reasonable? How should I test it?🥹

…ion kernel. Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

LucasWilkinson · 2025-12-29T13:08:54Z

you can test it by using export FLASH_MLA_SRC_DIR=<path-to-modified-flashmla> then rebuild vLLM

NOTE: id recommend installing ccache and using:

VLLM_DISABLE_SCCACHE=1 CCACHE_NOHASHDIR="true" uv pip install --no-build-isolation -e . -v

for faster rebuilds (first build will still be very slow), you can then even do:

VLLM_DISABLE_SCCACHE=1 python setup.py build_ext --inplace

to just rebuild the kernels

Signed-off-by: princepride <wangzhipeng628@gmail.com>

princepride · 2026-01-02T01:38:54Z

@LucasWilkinson It seems much more complicated than I imagined 😂. After trying this kernel and making several more versions, the model's performance has decreased.

Enable support for 16 or 32 attention heads in the SM90 Sparse Attent…

885c8b5

…ion kernel. Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

xxx

30d7881

Signed-off-by: princepride <wangzhipeng628@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable support for 16 or 32 heads in the SM90 Sparse Attention #11

Enable support for 16 or 32 heads in the SM90 Sparse Attention #11

princepride commented Dec 29, 2025

Uh oh!

LucasWilkinson commented Dec 29, 2025

Uh oh!

princepride commented Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Enable support for 16 or 32 heads in the SM90 Sparse Attention #11

Are you sure you want to change the base?

Enable support for 16 or 32 heads in the SM90 Sparse Attention #11

Conversation

princepride commented Dec 29, 2025

Purposal:

Test Plan:

Uh oh!

LucasWilkinson commented Dec 29, 2025

Uh oh!

princepride commented Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants