[NOT FOR LANDING] 355_wip_0909_rc2 -> 0909_rc2 by maleksan85 · Pull Request #654 · ROCm/vllm

maleksan85 · 2025-09-04T21:01:25Z

Please direct your PRs to the upstream vllm (https://github.com/vllm-project/vllm.git)

Accepting PRs into the ROCm fork (https://github.com/ROCm/vllm) will require a clear previously communicated exception

Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>

…_04_02

…stream_merge_2025_04_02

Upstream merge 2025 04 02

* Adding 2stage MoE support separately until it is added upstream * Missing layour param

* Enable fused fp8 out in V1 CPA and FA * Correct operation and creating the tensot or th correct type * Update to use for the non-custom path as well * This was a debug assert

…_04_07

Upstream merge 2025 04 07

* Added the extra use_irope parameter in Co-authored-by: Hongxia Yang <hongxia.yang@amd.com> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> * Fix ROCm V1 Engine Fused MoE Bug Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> * Add warning message that V0 do not support irope Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> --------- Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: Hongxia Yang <hongxia.yang@amd.com>

add RAY_EXPERIMENTAL_NOSET_ROCR_VISIBLE_DEVICES=1

Signed-off-by: maleksan85 <maleksan@amd.com> Co-authored-by: maleksan85 <maleksan@amd.com>

Signed-off-by: charlifu <charlifu@amd.com>

Signed-off-by: jpvillam <Juan.Villamizar@amd.com>

* Updated README.md with April 10 results * Updated README.md with "2-stage MoE and MLA from AITER"

Added correct path for Dockerfile.rocm under Docker manifest

Signed-off-by: charlifu <charlifu@amd.com>

The upstream has moved docker files into a separate directory. Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>

* Update test-template.j2 to fix new location of run-amd-test.sh Update test-template.j2 to fix new location of run-amd-test.sh Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com> * Update test-template.j2 upsie, wrong path initially Update test-template.j2 upsie, wrong path initially Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com> * Update test-template.j2 Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com> --------- Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>

* Adapted hipblaslt build to work with rocm 6.4 * rccl version compatible with 6.4 * Torch and triton combination that works * hipblaslt version and not rebuilding rccl * Fixing another package that we install now

Added Known Issues section to document meta 405B FP 8 model mem fault and work around.

…_04_21

Signed-off-by: seungrokjung <seungrok.jung@amd.com>

* Updated README.md for June 10 release * Added Docker Manifest git hash

* Updated README.md for June 24 Docker release * Added additional throughput results * Fixed some throughput results

* Minor changes to command line examples * README changes and added throughput results Still waiting on latency * Added latency results * Update README.md * Update README.md

* Update test-pipeline.yaml Disabling the "Tensorizer Test". The test is seen to generate exceptions while still reporting as successful. That needs to be verified before re-enabling the test in the production environment. Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com> * Fixing pre-commit complaints. Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com> * . Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com> --------- Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>

maleksan85 · 2025-09-05T00:43:28Z

vllm/v1/attention/backends/rocm_aiter_fa.py

        attn_type: AttentionType = AttentionType.DECODER,
        kv_sharing_target_layer_name: Optional[int] = None,
        sinks: Optional[torch.Tensor] = None,
+        sinks: Optional[torch.Tensor] = None,


Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>

support ck-tile blockquant gemm in vllm

…d size for CAR

…ng on AMD (vllm-project#23864) Signed-off-by: Jinghui Zhang <jinghuizhang0804@gmail.com>

vllmellm and others added 30 commits April 3, 2025 04:45

Merge remote-tracking branch 'origin/main' into aiter-mla-integration

1ceb3b9

fix mypy error on Iterable typing error

20a3f07

Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>

Merge remote-tracking branch 'upstream/main' into upstream_merge_2025…

7153046

…_04_02

Disable fp8_out_scale on V1

e3f03b7

Merge remote-tracking branch 'embedded/aiter-mla-integration' into up…

eaecf03

…stream_merge_2025_04_02

Merge pull request #499 from ROCm/upstream_merge_2025_04_02

c045f59

Upstream merge 2025 04 02

Bump aiter version (#500)

b101125

Adding 2stage MoE support separately until it is added upstream (#501)

6d258fa

* Adding 2stage MoE support separately until it is added upstream * Missing layour param

Fused FP8 conversion in attention for v1 (#502)

732455b

* Enable fused fp8 out in V1 CPA and FA * Correct operation and creating the tensot or th correct type * Update to use for the non-custom path as well * This was a debug assert

Merge remote-tracking branch 'upstream/main'

f657987

Merge remote-tracking branch 'upstream/main' into upstream_merge_2025…

2b6e9c9

…_04_07

Merge pull request #503 from ROCm/upstream_merge_2025_04_07

d17d4df

Upstream merge 2025 04 07

Update moe_tune_script.sh (#507)

97b78bf

add RAY_EXPERIMENTAL_NOSET_ROCR_VISIBLE_DEVICES=1

doubled size to wa issue and preserve CAR perf (#510)

f68829f

Signed-off-by: maleksan85 <maleksan@amd.com> Co-authored-by: maleksan85 <maleksan@amd.com>

re-enable custom paged attention for V0 (#511)

b8498bc

Signed-off-by: charlifu <charlifu@amd.com>

Add gfx950 to the attention archs

f4b308f

Signed-off-by: jpvillam <Juan.Villamizar@amd.com>

Linter

e201e58

Signed-off-by: jpvillam <Juan.Villamizar@amd.com>

Updated README.md with April 10 results (#512)

c43debd

* Updated README.md with April 10 results * Updated README.md with "2-stage MoE and MLA from AITER"

Update README.md (#514)

9025082

Added correct path for Dockerfile.rocm under Docker manifest

update base image (#515)

1c0a1ae

Signed-off-by: charlifu <charlifu@amd.com>

Merge remote-tracking branch 'upstream/main'

44c9580

Update test-template.j2 to enable building (#517)

40f2157

The upstream has moved docker files into a separate directory. Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>

Rocm 6.4 docker (#519)

e26141f

* Adapted hipblaslt build to work with rocm 6.4 * rccl version compatible with 6.4 * Torch and triton combination that works * hipblaslt version and not rebuilding rccl * Fixing another package that we install now

Update README.md (#521)

8ad1c44

Added Known Issues section to document meta 405B FP 8 model mem fault and work around.

Merge remote-tracking branch 'upstream/main' into upstream_merge_2025…

49e4719

…_04_21

Remove leftovers from 2stage

a9af7a9

Re-add 2stage moe

105e655

custom all-reduce, gfx950

ae144d6

Signed-off-by: seungrokjung <seungrok.jung@amd.com>

Mcirino1 and others added 16 commits September 1, 2025 05:15

Updated README.md for June 10 release (#574)

1f732c1

* Updated README.md for June 10 release * Added Docker Manifest git hash

Cleanup

ad2771e

New typos checker

a82a6f1

Merge leftover

f0a71e6

Cherry pick vllm-project#19158

9ea0c6e

Remove unused vars

37897a6

Updated README.md for June 24 Docker release (#589)

26a85c7

* Updated README.md for June 24 Docker release * Added additional throughput results * Fixed some throughput results

Minor changes to command line examples (#594)

d61a61c

* Minor changes to command line examples * README changes and added throughput results Still waiting on latency * Added latency results * Update README.md * Update README.md

cleanup

ec540fa

support ck-tile blockquant gemm in vllm

f4a4bdb

Rebase the ck_tile_gemm branch to rocm/355_wip

5800181

add triton fp8 gemm support

09ec68f

add fused_kv_cache support for llama fp8

48dc133

Merge remote-tracking branch 'rocm/0909_rc2' into 355_wip_0909_rc2

fc0dbad

Merge branch '355_wip' into ck_tile_gemm

25f843d

maleksan85 changed the title ~~355 wip 0909 rc2~~ 355_wip_0909_rc2 -> 0909_rc2 Sep 4, 2025

maleksan85 changed the title ~~355_wip_0909_rc2 -> 0909_rc2~~ [NOT FOR LANDING] 355_wip_0909_rc2 -> 0909_rc2 Sep 4, 2025

Aleksandr Malyshev added 2 commits September 4, 2025 22:17

sync with 0909_rc2 changes

f77bfba

sync with 0909_rc2 changes

867c2b3

maleksan85 commented Sep 5, 2025

View reviewed changes

mawong-amd and others added 9 commits September 5, 2025 00:36

Integrate mxfp4 MoE native kernels

caea443

Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>

Merge pull request #642 from eliotwang/ck_tile_gemm

176244a

support ck-tile blockquant gemm in vllm

Fix Qwen accuracy fix by not sending quant_config to MOE self.gate RLU

f83d4df

clean up

676b200

Merge branch '355_wip' into 355_wip_0909_rc2

fb3d439

merge artefact correction

0f826a6

updated logic for attn selection with default split attn

b193a40

updated logic for attn selection with default split attn and increase…

78aa33e

…d size for CAR

[BugFix][AMD][Deepseek] fix a dtype mismatch error for deepseek runni…

7a7123f

…ng on AMD (vllm-project#23864) Signed-off-by: Jinghui Zhang <jinghuizhang0804@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NOT FOR LANDING] 355_wip_0909_rc2 -> 0909_rc2#654

[NOT FOR LANDING] 355_wip_0909_rc2 -> 0909_rc2#654
maleksan85 wants to merge 974 commits into0909_rc2from
355_wip_0909_rc2

maleksan85 commented Sep 4, 2025 •

edited by github-actions bot

Loading

Uh oh!

maleksan85 Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

maleksan85 commented Sep 4, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maleksan85 Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

maleksan85 commented Sep 4, 2025 •

edited by github-actions bot

Loading