[OOT Plugin][Performance] Optimize metadata prepare by ganyi1996ppo · Pull Request #263 · ROCm/ATOM

ganyi1996ppo · 2026-03-04T07:29:06Z

Motivation

Optimize metadata preparation for MHA in OOT plugin, for decode only or prefill only case, the metadata prepare process should be compressed to 50~100 us at host side.

Technical Details

Test Plan

gsm8k

Test Result

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.8840	±	0.0088
		strict-match	5	exact_match	↑	0.8749	±	0.0091

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Signed-off-by: ganyi <ygan@amd.com>

wuhuikx · 2026-03-04T07:32:03Z

Can you help attach the accuracy check result for Qwen with OOT mode?

Copilot

Pull request overview

Optimizes host-side attention metadata preparation for the OOT plugin (targeting faster decode-only and prefill-only paths) by avoiding unnecessary CPU computations and simplifying metadata.

Changes:

Removed min_query_len from plugin-mode flash-attn metadata dataclasses.
Added decode-only / prefill-only fast paths to skip seq_lens.cpu() and query-length derivation for non-mixed batches.
Adjusted how prefill_metadata and num_actual_kv_tokens are produced during build().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

atom/plugin/attention.py

Copilot · 2026-03-04T07:33:34Z

atom/plugin/attention.py

+                    else query_lens_cpu[:num_decodes].max().item()
+                ),
+                max_seq_len=(
+                    common_attn_metadata.max_seq_len
+                    if prefill_only
+                    else query_lens_cpu[:num_decodes].max().item()


prefill_metadata.max_seq_len is computed from query_lens_cpu[:num_decodes] in the mixed-request path. This is both the wrong slice (decode, not prefill) and the wrong source (query lengths, not seq lengths), and will under-report max_seqlen_k passed to flash_attn_varlen_func.

Compute it from the prefill portion’s sequence lengths (or equivalent prefill query lengths if those are guaranteed equal here).

Suggested change

else query_lens_cpu[:num_decodes].max().item()

),

max_seq_len=(

common_attn_metadata.max_seq_len

if prefill_only

else query_lens_cpu[:num_decodes].max().item()

else query_lens_cpu[num_decodes + num_extends :].max().item()

),

max_seq_len=(

common_attn_metadata.max_seq_len

if prefill_only

else seq_lens_cpu[num_decodes + num_extends :].max().item()

valarLip · 2026-03-04T09:47:09Z

atom/plugin/attention.py

-        seq_lens = common_attn_metadata.seq_lens.cpu()
-        query_lens_cpu = query_start_loc_cpu[1:] - query_start_loc_cpu[:-1]
+        if mixed_request:
+            seq_lens = common_attn_metadata.seq_lens.cpu()


vllm don't have something like seq_lens_cpu?

vllm deprecate the seq_lens_cpu in common_metadata for a while.

ganyi1996ppo · 2026-03-05T03:00:07Z

accuracy check result for Qwen with OOT mode?

@wuhuikx Accuracy data attached in the PR description.

Signed-off-by: ganyi <ygan@amd.com>

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ganyi1996ppo added 2 commits March 4, 2026 07:25

optimize metadata prepare

253b402

Signed-off-by: ganyi <ygan@amd.com>

black

e74398c

Signed-off-by: ganyi <ygan@amd.com>

Copilot AI review requested due to automatic review settings March 4, 2026 07:29

Copilot started reviewing on behalf of ganyi1996ppo March 4, 2026 07:30 View session

Copilot AI reviewed Mar 4, 2026

View reviewed changes

valarLip reviewed Mar 4, 2026

View reviewed changes

ganyi1996ppo and others added 2 commits March 5, 2026 03:07

fix prefill prepare bug

8547e3e

Signed-off-by: ganyi <ygan@amd.com>

Merge branch 'main' into ganyi/optimize_metadata_prepare

8fd304d

Copilot AI review requested due to automatic review settings March 6, 2026 06:04

Copilot started reviewing on behalf of XiaobingSuper March 6, 2026 06:05 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OOT Plugin][Performance] Optimize metadata prepare#263

[OOT Plugin][Performance] Optimize metadata prepare#263
ganyi1996ppo wants to merge 4 commits intomainfrom
ganyi/optimize_metadata_prepare

ganyi1996ppo commented Mar 4, 2026 •

edited

Loading

Uh oh!

wuhuikx commented Mar 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 4, 2026

Uh oh!

valarLip Mar 4, 2026

Uh oh!

ganyi1996ppo Mar 5, 2026

Uh oh!

ganyi1996ppo commented Mar 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

ganyi1996ppo commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

wuhuikx commented Mar 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

valarLip Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

ganyi1996ppo Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

ganyi1996ppo commented Mar 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ganyi1996ppo commented Mar 4, 2026 •

edited

Loading