[ROCm][Deepseekv3.2][Perf] dsv3.2 further optimization on vllm by ganyi1996ppo · Pull Request #32649 · vllm-project/vllm

ganyi1996ppo · 2026-01-20T07:50:21Z

Purpose

This PR move some of the original feature from #29287 to here. Includes some triton 3.5.0 depending kernel. And add more optimization on ROCMAiterMLASparseBackend. This PR depends on #29287 to merge

Test Plan

gsm8k with 20 shot

Test Result

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	20	exact_match	↑	0.9484	±	0.0061
		strict-match	20	exact_match	↑	0.9484	±	0.0061

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

mergify · 2026-01-20T07:51:14Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ganyi1996ppo.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

gemini-code-assist

Code Review

This pull request introduces performance optimizations for Deepseek v3.2 on ROCm by adding new Triton kernels and a specialized backend. It also refactors the sparse_attn_indexer logic into a dedicated file, which is a good architectural improvement. However, I've identified a critical bug in the refactored CUDA path that could lead to an AttributeError, and a significant limitation in the new ROCm kernels due to a hardcoded value that restricts flexibility. Addressing these issues will improve the robustness and applicability of these optimizations.

gemini-code-assist · 2026-01-20T07:53:16Z

vllm/model_executor/layers/sparse_attn_indexer.py

+    ):
+        return torch.ops.vllm.sparse_attn_indexer(
+            hidden_states,
+            self.k_cache.layer_prefix,


The k_cache object is of type DeepseekV32IndexerCache, which has a prefix attribute but not a layer_prefix attribute. Using self.k_cache.layer_prefix will result in an AttributeError. The HIP path correctly uses self.k_cache.prefix. This should be consistent.

Suggested change

self.k_cache.layer_prefix,

self.k_cache.prefix,

gemini-code-assist · 2026-01-20T07:53:16Z

vllm/v1/attention/ops/rocm_aiter_mla_sparse.py

+                chunk.cu_seqlen_ke,
+            )
+            num_rows = logits.shape[0]
+            assert topk_tokens == 2048, "top_k_per_row assumes size 2048"


The code asserts that topk_tokens must be 2048. This hardcoded value limits the flexibility of the sparse attention indexer. If this is a temporary limitation of the underlying custom C++ op, it should be noted with a TODO. For broader applicability, this should be made more flexible or at least provide a more informative error message if the value is unsupported.

gemini-code-assist · 2026-01-20T07:53:16Z

vllm/v1/attention/ops/rocm_aiter_mla_sparse.py

+        )
+
+        num_rows = logits.shape[0]
+        assert topk_tokens == 2048, "top_k_per_row assumes size 2048"


Similar to the prefill path, the decode path also asserts that topk_tokens must be 2048. This hardcoded value is restrictive and should be generalized if possible to support other values.

Signed-off-by: ganyi <ygan@amd.com>

mergify · 2026-01-20T15:32:49Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ganyi1996ppo.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: ganyi <ygan@amd.com>

mergify bot added deepseek Related to DeepSeek models rocm Related to AMD ROCm v1 labels Jan 20, 2026

mergify bot added the needs-rebase label Jan 20, 2026

gemini-code-assist bot reviewed Jan 20, 2026

View reviewed changes

ganyi1996ppo force-pushed the ganyi/dsv3.2_further_opt branch from 40265e8 to 9707e58 Compare January 20, 2026 09:17

mergify bot removed the needs-rebase label Jan 20, 2026

ganyi1996ppo added 7 commits January 20, 2026 09:22

enable mla_asm in sparse_mla backend

c1d785c

Signed-off-by: ganyi <ygan@amd.com>

refactor the SparseAttnIndexer as CustomOp

57b0f87

Signed-off-by: ganyi <ygan@amd.com>

raise NotImplementedError for other platform

608e41f

Signed-off-by: ganyi <ygan@amd.com>

remove import

439f16c

Signed-off-by: ganyi <ygan@amd.com>

further optimize dsv3.2

d243f6f

Signed-off-by: ganyi <ygan@amd.com>

make gluon impl as default

803117e

Signed-off-by: ganyi <ygan@amd.com>

fix sparse len calculation issue

e569fa2

Signed-off-by: ganyi <ygan@amd.com>

ganyi1996ppo force-pushed the ganyi/dsv3.2_further_opt branch from 9707e58 to e569fa2 Compare January 20, 2026 15:32

mergify bot added the needs-rebase label Jan 20, 2026

ganyi1996ppo added 2 commits January 27, 2026 02:18

fix ptpc scale load issue for fused shared expert path in deepseek mtp

9b8f5ef

Signed-off-by: ganyi <ygan@amd.com>

fp8 kvcache support

51208a1

Signed-off-by: ganyi <ygan@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm][Deepseekv3.2][Perf] dsv3.2 further optimization on vllm#32649

[ROCm][Deepseekv3.2][Perf] dsv3.2 further optimization on vllm#32649
ganyi1996ppo wants to merge 9 commits intovllm-project:mainfrom
ROCm:ganyi/dsv3.2_further_opt

ganyi1996ppo commented Jan 20, 2026 •

edited by github-actions bot

Loading

Uh oh!

mergify bot commented Jan 20, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 20, 2026

Uh oh!

gemini-code-assist bot Jan 20, 2026

Uh oh!

gemini-code-assist bot Jan 20, 2026

Uh oh!

mergify bot commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ganyi1996ppo commented Jan 20, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify bot commented Jan 20, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ganyi1996ppo commented Jan 20, 2026 •

edited by github-actions bot

Loading