Update fp8 paged attention by amd-xiaoyu12 · Pull Request #592 · ROCm/vllm

amd-xiaoyu12 · 2025-07-09T16:57:17Z

Please direct your PRs to the upstream vllm (https://github.com/vllm-project/vllm.git)

Accepting PRs into the ROCm fork (https://github.com/ROCm/vllm) will require a clear previously communicated exception

Summary:
Support full fp8 MFMA with wrap level dynamic query quantization to improve fp8 performance on MI308, which can also benefits other MI300x accelerator or latest hardware.

Performance

Unit test - attention output

* Lm-eval-harness ppl test

…odeowners (ROCm#431)

* Enabling ROCm CI on MI250 machines: - correct build target - correct queue Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com> --------- Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>

* Optimization for quantized gemm skinny sizes * lint fix * Add support for bf16/fp16 * code cleanup * code cleanup * lint fix2 * cleanup * Moved the logic into tuned gemm to preserve API compatibility --------- Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com> Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

* Removing gfx940 and gfx941 targets. These have been deprecated in favor of gfx942 for MI300X Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> * Remove from custom kernels as well --------- Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

Signed-off-by: Divakar Verma <divakar.verma@amd.com>

* Advance torch commit to be past pytorch/pytorch#144942 to fix tunable ops * Make sure to use the submodule commit compatible with the main aiter commit

…t is fixed (ROCm#443)

Signed-off-by: Sage Moore <sage@neuralmagic.com>

…2_24

…m_merge_25_02_24

Upstream merge 25 02 24

* Using aiter branch that can be built into a whl with PREBUILD_KERNELS=1 * Using fail fast on aiter build to see compilation errors in the log since it fails silently * Check for build success without installing whl

* Using proposed fix from ROCm/aiter#115 * Build fix

* tuning adjustment for quantized skinny gemm. * lint fix

…3_03

)" This reverts commit 8294773.

Upstream merge 25 03 03

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

Signed-off-by: Sage Moore <sage@neuralmagic.com>

Upstream merge 2025 06 23

Upstream merge 2025 06 25

Upstream merge 2025 06 30

* Updated README.md for June 24 Docker release * Added additional throughput results * Fixed some throughput results

gshtras and others added 30 commits February 17, 2025 15:42

Updating PR template to point people to the upstream repo. Updating c…

4fd2f5b

…odeowners (ROCm#431)

Enabling the ROCm-vLLM CI on MI250 machines (ROCm#432)

17b26bd

* Enabling ROCm CI on MI250 machines: - correct build target - correct queue Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com> --------- Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>

Restricting FP8 wvSplitk to MI300x (ROCm#439)

b63a984

resolve diff for mixtral8x7B configs (ROCm#437)

5a6afcc

Signed-off-by: Divakar Verma <divakar.verma@amd.com>

Torch version bump to fix tunable ops (ROCm#442)

ff13c7a

* Advance torch commit to be past pytorch/pytorch#144942 to fix tunable ops * Make sure to use the submodule commit compatible with the main aiter commit

Using AITER branch with fixed whl. Disabling PREBUILD_KERNELS until i…

cea7419

…t is fixed (ROCm#443)

Bump hipblaslt version. Minor fixes to printing the versions (ROCm#447)

118296d

Bumping the version in the right place (ROCm#448)

18689d8

init

07336d2

Signed-off-by: Sage Moore <sage@neuralmagic.com>

init

c226a30

Signed-off-by: Sage Moore <sage@neuralmagic.com>

update logs

ae3594e

Signed-off-by: Sage Moore <sage@neuralmagic.com>

Merge remote-tracking branch 'upstream/main' into upstream_merge_25_0…

92a2279

…2_24

Merge remote-tracking branch 'nm/sage/deepseek-rocm-fix' into upstrea…

8230388

…m_merge_25_02_24

Merge branch 'main' into upstream_merge_25_02_24

d619b41

Fix test that was missed by local linters

46c1c97

Merge pull request ROCm#449 from ROCm/upstream_merge_25_02_24

ba6f019

Upstream merge 25 02 24

Stable aiter build (ROCm#450)

b5a4a37

* Using aiter branch that can be built into a whl with PREBUILD_KERNELS=1 * Using fail fast on aiter build to see compilation errors in the log since it fails silently * Check for build success without installing whl

Remove batch padding on ROCm (ROCm#451)

f932181

Aiter whl fix branch (ROCm#452)

386763c

* Using proposed fix from ROCm/aiter#115 * Build fix

tuning adjustment for quantized skinny gemm. (ROCm#444)

fd70f59

* tuning adjustment for quantized skinny gemm. * lint fix

Merge remote-tracking branch 'upstream/main' into upstream_merge_25_0…

24c6283

…3_03

Revert "[core] Perf improvement for DSv3 on AMD GPUs (vllm-project#13718

87bf00a

)" This reverts commit 8294773.

using list for typing

7cd9ea1

Merge pull request ROCm#458 from ROCm/upstream_merge_25_03_03

caa2810

Upstream merge 25 03 03

cython doesn't support type (ROCm#460)

f501118

Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>

Building the base images for MI and Navi; Using aiter hotfix (ROCm#461)

27f6c7b

init

ae056e1

Signed-off-by: Sage Moore <sage@neuralmagic.com>

Building hipblaslt including the clients (ROCm#462)

0feb91a

gshtras and others added 9 commits June 23, 2025 16:36

linter

324b18a

Merge pull request ROCm#581 from ROCm/upstream_merge_2025_06_23

c4258f4

Upstream merge 2025 06 23

Merge remote-tracking branch 'upstream/main'

52741bd

Merge remote-tracking branch 'hyoon1/remove_unused_var'

4ed2d76

Merge pull request ROCm#583 from ROCm/upstream_merge_2025_06_25

1f85814

Upstream merge 2025 06 25

Merge remote-tracking branch 'upstream/main'

d171777

Merge pull request ROCm#586 from ROCm/upstream_merge_2025_06_30

0f7ec48

Upstream merge 2025 06 30

Updated README.md for June 24 Docker release (ROCm#589)

5486e7b

* Updated README.md for June 24 Docker release * Added additional throughput results * Fixed some throughput results

Add new fp8 conversion

024a2b3

amd-xiaoyu12 changed the title ~~Update fp8 paged attention~~ Update fp8 paged attention for MI308 Jul 9, 2025

Xiao YU added 17 commits July 10, 2025 16:44

Support full fp8 mfma

0860379

Add new mfma fp8 function

f9736d4

Support wrap level dynamic q-scale

556b922

Update max q-scale

f94b6df

Clean up code

15780b7

Update CMakeList.txt for fp8 instructions support

03ca842

Add reinterpret_cast

332a979

Add reinterpret_cast

87005a8

Add verification test

69811c8

Update test

cbd78e7

Update fp8 q scale code

1368327

Full fp8 mfma for PV calculation

020a7f4

Fix Q index error

c6c12ba

Update test

66a5c1f

Update test

418752b

Update fp8 PV index

1775084

Update unit test fp16 vs fp8 output verify logic

b8add97

amd-xiaoyu12 changed the title ~~Update fp8 paged attention for MI308~~ Update fp8 paged attention Aug 4, 2025

gshtras force-pushed the main branch 2 times, most recently from 1d2c43d to eb9d4de Compare September 9, 2025 16:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update fp8 paged attention#592

Update fp8 paged attention#592
amd-xiaoyu12 wants to merge 890 commits intoROCm:mainfrom
amd-xiaoyu12:fp8-paged-attention

amd-xiaoyu12 commented Jul 9, 2025 •

edited by github-actions bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

Conversation

amd-xiaoyu12 commented Jul 9, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

amd-xiaoyu12 commented Jul 9, 2025 •

edited by github-actions bot

Loading