[Bugfix] [DeepSeek-V3.2] fix sparse_attn_indexer weights padding#35277
[Bugfix] [DeepSeek-V3.2] fix sparse_attn_indexer weights padding#35277kebe7jun wants to merge 3 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Kebe <mail@kebe7jun.com>
There was a problem hiding this comment.
Code Review
This pull request correctly addresses a bug in the sparse_attn_indexer by ensuring the weights tensor is padded consistently with the q_fp8 tensor during the decode phase. The changes in vllm/model_executor/layers/sparse_attn_indexer.py properly handle both cases where padding is required and not required, preventing potential shape mismatches and incorrect behavior. The fix appears correct and complete.
|
@LucasWilkinson PTAL. |
|
@kebe7jun thanks for the contribution! https://github.com/vllm-project/vllm/pull/34552/changes#r2855124116 is actually close to landing which should eliminate the need for padding, will hold off to see if we can land that in a couple days |
|
This pull request has merge conflicts that must be resolved before it can be |
Purpose
This PR #29287 does not include #32175, so, it will cause
batch_size_next_n == batch_size * next_ncc @ganyi1996ppo
Test Plan
and run bench:
Test Result
vllm bench ok.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.