Test Queues by dhonnappa-amd · Pull Request #456 · ROCm/vllm

dhonnappa-amd · 2025-02-28T16:05:27Z

DO NOT MERGE.
TESTING K8s agents with different gpu sizes

* Restoring deleted .buildkite/test-template.j2 * Enabling agents for split HW --------- Co-authored-by: Alexei Ivanov <alivanov@gpu9448.jax.cs.cpe.ice.amd.com>

…src (#188)" (#194) This reverts commit c68c242.

#193) * [Grok-1] 1. upload moe configuration file for moe kernel optimization 2. support "--num-scheduler-steps" in benchmark_latency.py * [Grok-1] 1. upload moe configuration file for moe kernel optimization 2. add copy of benchmark_latency.py to support "--num-scheduler-steps" * [Grok-1] add option num-scheduler-steps in benchmark_latency.py

* Removing the original text in reminder_comment.yml And testing manually disabled github action that launches this script. * Delete .github/workflows/reminder_comment.yml

* update custom PA kernel with support for fp8 kv cache dtype; change custom PA partition size to 512 to prefer throughput scenarios at cost of latency * Fix lint * Fix BF16 with FP8 KV cache (scaled conversion incorrectly done in fp16) * Fix custom PA tests * Merge branch 'main' of git@github.com:ROCm/vllm.git into mawong/fix_custom_pa_tests * Fix partition sizes for PAv2, PAcustom * Fix linting * Fix a few names and variable scopes * Rename custom to rocm as per suggestion --------- Co-authored-by: Shomy Sanyal <shomy.sanyal@amd.com>

* Adding P3L measurement to the benchmarks collection tools. * . * . * . * .

#199) Adding P3L measurement to the benchmarks collection tools. A more beautiful version of the code with "Swapping the order of sampling operations in the conditional selector. (#199)"

* Fixing incompatibility with cython. * Change type as per reviewer suggestions Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com> --------- Co-authored-by: Matt Wong <156021403+mawong-amd@users.noreply.github.com> Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>

* Adding bias to hipb_mm in gradlib. Expanding gradlib to tune based on bias and dtype captured from the actual capture run

Upstream merge 24 9 23

…ero-sized tensor, on which skinny gemm fails (#204)

…llm-project#7995)" (#207) This reverts commit 34a0e96.

…hen num-scheduler-steps>1 (#210)

* extend moe padding to DUMMY weights

…e requirements-build Changing back PromptType to PromptInputs following refactoring revert

* Enable RPD for single/multi gpu Co-authored-by: AdrianAbeyta <adrian.abeyta@amd.com> * Add rpd build instructions to Dockerfile.rocm * Handle env path * Fix code errors * Move RPD based profiling over to profiling folder * use envs vs os.getenv --------- Co-authored-by: AdrianAbeyta <adrian.abeyta@amd.com>

…27_0.6.2

* adding cython into docker file with flag * correcting if

…27_0.6.2

* integrate new cpa kernel, update tests and benchmark * added comments to mfma4 kernel * further comments for mfma16 kernel * clang-format * Lint * add flag for logits rtz conversion and disable by default * lint * [Bugfix]: Fix paged attention unit tests of #372 (#389) * [Bugfix]: fix paged attention tests based on the updated kernels in `csrc/attention/paged_attention_v1.cu`,`csrc/attention/paged_attention_v2.cu` and `csrc/rocm/attention.cu`. * improve code documentation. * lint --------- Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com> --------- Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com> Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: Joe Shajrawi <17753158+shajrawi@users.noreply.github.com> Co-authored-by: TJian <tunjian1996@gmail.com> Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>

…s padding (#394)

…1_31

… quant is not supported Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>

Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>

* Aiter section * Aiter section in docker * Enablement * Only exposing a single knob * More details on env defaults

…2_03

…merge_25_02_03

…pstream_merge_25_02_03

Correct initial values

Upstream merge 25 02 03

* Enabling P3L.py & P3L_mling.py tests to run with multiple batched queries. This alternation adds minimal measurement noise. The underlining testing material is the same, the resulting measurements are comparable to the old (BS=1) testing runs. Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com> * Making linters happy. Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com> * Changed the device specification for the 'forced_sample' tensor. The resulting implementation produces identical measurement, and, actually, became faster (3.21s/it vs 3.42s/it with previous commit). Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com> * Fixing reporting to reflect processed intervals. Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com> --------- Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>

Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>

Alexei-V-Ivanov-AMD and others added 30 commits September 17, 2024 17:07

Enabling Splitting HW by Buildkite Agents (#191)

7d3690c

* Restoring deleted .buildkite/test-template.j2 * Enabling agents for split HW --------- Co-authored-by: Alexei Ivanov <alivanov@gpu9448.jax.cs.cpe.ice.amd.com>

Revert "remove redundant slice; match decode PA partition size with c…

54e0441

…src (#188)" (#194) This reverts commit c68c242.

Removing the original text in reminder_comment.yml (#195)

d21cf99

* Removing the original text in reminder_comment.yml And testing manually disabled github action that launches this script. * Delete .github/workflows/reminder_comment.yml

Adding P3L measurement to the benchmarks collection tools. (#197)

7094103

* Adding P3L measurement to the benchmarks collection tools. * . * . * . * .

Swapping the order of sampling operations in the conditional selector. (

9d8035b

#199) Adding P3L measurement to the benchmarks collection tools. A more beautiful version of the code with "Swapping the order of sampling operations in the conditional selector. (#199)"

remove redundant slice when chunked prefill feature is disabled (#201)

0e80e85

Merge remote-tracking branch 'upstream/main' into upstream_merge_24_9_23

87acddd

isort

7e2ac48

Bias and more metadata in gradlib and tuned gemm (#202)

1f0d319

* Adding bias to hipb_mm in gradlib. Expanding gradlib to tune based on bias and dtype captured from the actual capture run

Bias and more metadata in gradlib and tuned gemm (#202)

6e370fc

* Adding bias to hipb_mm in gradlib. Expanding gradlib to tune based on bias and dtype captured from the actual capture run

Merge remote-tracking branch 'origin/main' into upstream_merge_24_9_23

cebe70c

Merge pull request #203 from ROCm/upstream_merge_24_9_23

57ea101

Upstream merge 24 9 23

With chunked prefil, for large prompts, the sampler can encounter a z…

48c0cb4

…ero-sized tensor, on which skinny gemm fails (#204)

Revert "[Kernel] changing fused moe kernel chunk size default to 32k (v…

cc2039c

…llm-project#7995)" (#207) This reverts commit 34a0e96.

re-enable avoid torch slice fix when chunked prefill is disabled (#209)

a5d87a1

add block_manager_v2.py into setup_cython: block_manager_v2 is used w…

5c50fca

…hen num-scheduler-steps>1 (#210)

extend moe padding to DUMMY weights (#211)

9858710

* extend moe padding to DUMMY weights

Merge remote-tracking branch 'upstream/main' into main

c5b1012

Add setuptools-scm requirement to requirements-rocm since we don't us…

1adaa9a

…e requirements-build Changing back PromptType to PromptInputs following refactoring revert

[Int4-AWQ] Fix AWQ Marlin check for ROCm (#206)

b79f9f4

Merge branch 'main' into upstream_merge_24_09_27_0.6.2

aac2e0b

Merge remote-tracking branch 'origin/main' into upstream_merge_24_09_…

8850323

…27_0.6.2

Cythonize vllm build (#214)

0a5881d

* adding cython into docker file with flag * correcting if

Merge remote-tracking branch 'origin/main' into upstream_merge_24_09_…

3d2bd9b

…27_0.6.2

Fix Dockerfile.rocm (#215)

956b831

Merge remote-tracking branch 'origin/main' into upstream_merge_24_09_…

4f57e44

…27_0.6.2

sanyalington and others added 28 commits January 30, 2025 14:21

Using a more precise profiling on ROCm to properly account for weight…

22141e7

…s padding (#394)

Update Dockerfile.rocm

6852819

Merge remote-tracking branch 'upstream/main' into upstream_merge_25_0…

339ba27

…1_31

Remove redundant code paths

d47b834

Fix MLA and logic for using triton scaled_mm on ROCm as blockwise FP8…

2fa8a9d

… quant is not supported Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>

use MLA on rocm

3523ce5

Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>

pre-commit format

3930fdd

Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>

Aiter readme (#400)

4e7e709

* Aiter section * Aiter section in docker * Enablement * Only exposing a single knob * More details on env defaults

Merge remote-tracking branch 'upstream/main' into upstream_merge_25_0…

14a02be

…2_03

Merge remote-tracking branch 'upstream/main' into upstream_merge_25_0…

0e24b85

…2_03

Merge remote-tracking branch 'hongxia/enable_deepseek' into upstream_…

92b42cd

…merge_25_02_03

fix None dict (#402)

fdb06c3

Merge branch 'main' into upstream_merge_25_02_03

b59d8c3

New linters

8dbc899

Merge branch 'upstream_merge_25_02_03' of github.com:ROCm/vllm into u…

76b8163

…pstream_merge_25_02_03

Custom params for mla attention backend

c887bc9

Correct initial values

Merge pull request #403 from ROCm/upstream_merge_25_02_03

b43c8d1

Upstream merge 25 02 03

Test build to check processing by different K8 queues.

ea787b0

Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>

Testing.

01dfdda

Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>

Copying over the tests directory to enable CI testing.

7f80bf8

Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>

Comparing with MI250 in the "mi250_8xGPU" queue.

14aaf35

Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>

Building with "test" as a --target

a106489

Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>

Fixing working directory property.

6acfc3a

Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>

Dummy alternation to confirm trouble with simultaneous test execution.

172e0e8

Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>

Dummy alternation to trigger a re-build and re-test.

114e750

Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>

queue test

2bd2caf

gshtras force-pushed the main branch 2 times, most recently from 1d2c43d to eb9d4de Compare September 9, 2025 16:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Queues#456

Test Queues#456
dhonnappa-amd wants to merge 621 commits intomainfrom
split_gpu

dhonnappa-amd commented Feb 28, 2025 •

edited by github-actions bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

dhonnappa-amd commented Feb 28, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

dhonnappa-amd commented Feb 28, 2025 •

edited by github-actions bot

Loading