Skip to content

Test Queues#456

Draft
dhonnappa-amd wants to merge 621 commits intomainfrom
split_gpu
Draft

Test Queues#456
dhonnappa-amd wants to merge 621 commits intomainfrom
split_gpu

Conversation

@dhonnappa-amd
Copy link
Collaborator

@dhonnappa-amd dhonnappa-amd commented Feb 28, 2025

DO NOT MERGE.
TESTING K8s agents with different gpu sizes

Alexei-V-Ivanov-AMD and others added 30 commits September 17, 2024 17:07
* Restoring deleted .buildkite/test-template.j2

* Enabling agents for split HW

---------

Co-authored-by: Alexei Ivanov <alivanov@gpu9448.jax.cs.cpe.ice.amd.com>
#193)

* [Grok-1] 1. upload moe configuration file for moe kernel optimization 2. support "--num-scheduler-steps" in benchmark_latency.py

* [Grok-1] 1. upload moe configuration file for moe kernel optimization 2. add copy of benchmark_latency.py to support "--num-scheduler-steps"

* [Grok-1] add option num-scheduler-steps in benchmark_latency.py
* Removing the original text in reminder_comment.yml 

And testing manually disabled github action that launches this script.

* Delete .github/workflows/reminder_comment.yml
* update custom PA kernel with support for fp8 kv cache dtype; change custom PA partition size to 512 to prefer throughput scenarios at cost of latency

* Fix lint

* Fix BF16 with FP8 KV cache (scaled conversion incorrectly done in fp16)

* Fix custom PA tests

* Merge branch 'main' of git@github.com:ROCm/vllm.git into mawong/fix_custom_pa_tests

* Fix partition sizes for PAv2, PAcustom

* Fix linting

* Fix a few names and variable scopes

* Rename custom to rocm as per suggestion

---------

Co-authored-by: Shomy Sanyal <shomy.sanyal@amd.com>
* Adding P3L measurement to the benchmarks collection tools.

* .

* .

* .

* .
#199)

Adding P3L measurement to the benchmarks collection tools. A more beautiful version of the code with "Swapping the order of sampling operations in the conditional selector. (#199)"
* Fixing incompatibility with cython.

* Change type as per reviewer suggestions

Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>

---------

Co-authored-by: Matt Wong <156021403+mawong-amd@users.noreply.github.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
* Adding bias to hipb_mm in gradlib. Expanding gradlib to tune based on bias and dtype captured from the actual capture run
* Adding bias to hipb_mm in gradlib. Expanding gradlib to tune based on bias and dtype captured from the actual capture run
…ero-sized tensor, on which skinny gemm fails (#204)
* extend moe padding to DUMMY weights
…e requirements-build

Changing back PromptType to PromptInputs following refactoring revert
* Enable RPD for single/multi gpu

Co-authored-by: AdrianAbeyta <adrian.abeyta@amd.com>

* Add rpd build instructions to Dockerfile.rocm

* Handle env path

* Fix code errors

* Move RPD based profiling over to profiling folder

* use envs vs os.getenv

---------

Co-authored-by: AdrianAbeyta <adrian.abeyta@amd.com>
* adding cython into docker file with flag

* correcting if
sanyalington and others added 28 commits January 30, 2025 14:21
* integrate new cpa kernel, update tests and benchmark

* added comments to mfma4 kernel

* further comments for mfma16 kernel

* clang-format

* Lint

* add flag for logits rtz conversion and disable by default

* lint

* [Bugfix]: Fix paged attention unit tests of #372 (#389)

* [Bugfix]: fix paged attention tests based on the updated kernels in `csrc/attention/paged_attention_v1.cu`,`csrc/attention/paged_attention_v2.cu` and  `csrc/rocm/attention.cu`.

* improve code documentation.

* lint

---------

Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>

---------

Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Joe Shajrawi <17753158+shajrawi@users.noreply.github.com>
Co-authored-by: TJian <tunjian1996@gmail.com>
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
… quant is not supported

Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>
* Aiter section

* Aiter section in docker

* Enablement

* Only exposing a single knob

* More details on env defaults
* Enabling P3L.py & P3L_mling.py tests to run with multiple batched
queries.

This alternation adds minimal measurement noise.

The underlining testing material is the same, the resulting measurements
are comparable to the old (BS=1) testing runs.

Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>

* Making linters happy.

Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>

* Changed the device specification for the 'forced_sample' tensor.
The resulting implementation produces identical measurement, and,
actually, became faster (3.21s/it vs 3.42s/it with previous commit).

Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>

* Fixing reporting to reflect processed intervals.

Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>

---------

Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
@gshtras gshtras force-pushed the main branch 2 times, most recently from 1d2c43d to eb9d4de Compare September 9, 2025 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.