Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
621 commits
Select commit Hold shift + click to select a range
7d3690c
Enabling Splitting HW by Buildkite Agents (#191)
Alexei-V-Ivanov-AMD Sep 17, 2024
54e0441
Revert "remove redundant slice; match decode PA partition size with c…
gshtras Sep 18, 2024
40581f4
[Grok-1] 1. upload moe configuration file for moe kernel optimization…
kkHuang-amd Sep 18, 2024
d21cf99
Removing the original text in reminder_comment.yml (#195)
Alexei-V-Ivanov-AMD Sep 18, 2024
a67b65b
Fix PA custom and PA v2 tests and partition sizes (#196)
mawong-amd Sep 18, 2024
7094103
Adding P3L measurement to the benchmarks collection tools. (#197)
Alexei-V-Ivanov-AMD Sep 19, 2024
9d8035b
Swapping the order of sampling operations in the conditional selector…
Alexei-V-Ivanov-AMD Sep 19, 2024
0e80e85
remove redundant slice when chunked prefill feature is disabled (#201)
sanyalington Sep 20, 2024
bae9170
Fixing P3L incompatibility with cython. (#200)
Alexei-V-Ivanov-AMD Sep 20, 2024
87acddd
Merge remote-tracking branch 'upstream/main' into upstream_merge_24_9_23
gshtras Sep 23, 2024
7e2ac48
isort
gshtras Sep 23, 2024
1f0d319
Bias and more metadata in gradlib and tuned gemm (#202)
gshtras Sep 23, 2024
6e370fc
Bias and more metadata in gradlib and tuned gemm (#202)
gshtras Sep 23, 2024
cebe70c
Merge remote-tracking branch 'origin/main' into upstream_merge_24_9_23
gshtras Sep 23, 2024
57ea101
Merge pull request #203 from ROCm/upstream_merge_24_9_23
gshtras Sep 23, 2024
48c0cb4
With chunked prefil, for large prompts, the sampler can encounter a z…
gshtras Sep 23, 2024
cc2039c
Revert "[Kernel] changing fused moe kernel chunk size default to 32k …
gshtras Sep 25, 2024
a5d87a1
re-enable avoid torch slice fix when chunked prefill is disabled (#209)
sanyalington Sep 26, 2024
5c50fca
add block_manager_v2.py into setup_cython: block_manager_v2 is used w…
sanyalington Sep 26, 2024
9858710
extend moe padding to DUMMY weights (#211)
divakar-amd Sep 26, 2024
c5b1012
Merge remote-tracking branch 'upstream/main' into main
gshtras Sep 27, 2024
1adaa9a
Add setuptools-scm requirement to requirements-rocm since we don't us…
gshtras Sep 27, 2024
b79f9f4
[Int4-AWQ] Fix AWQ Marlin check for ROCm (#206)
hegemanjw4amd Sep 27, 2024
aac2e0b
Merge branch 'main' into upstream_merge_24_09_27_0.6.2
gshtras Sep 27, 2024
a87da2b
RPD Profiling (#208)
dllehr-amd Sep 27, 2024
8850323
Merge remote-tracking branch 'origin/main' into upstream_merge_24_09_…
gshtras Sep 27, 2024
0a5881d
Cythonize vllm build (#214)
maleksan85 Sep 27, 2024
3d2bd9b
Merge remote-tracking branch 'origin/main' into upstream_merge_24_09_…
gshtras Sep 27, 2024
956b831
Fix Dockerfile.rocm (#215)
gshtras Sep 27, 2024
4f57e44
Merge remote-tracking branch 'origin/main' into upstream_merge_24_09_…
gshtras Sep 27, 2024
2d7ab9e
fix dbrx weight loader (#212)
divakar-amd Oct 1, 2024
f49394a
Merge remote-tracking branch 'origin/main' into upstream_merge_24_09_…
gshtras Oct 2, 2024
030374b
Merge pull request #213 from ROCm/upstream_merge_24_09_27_0.6.2
gshtras Oct 2, 2024
47d6392
Make rpdtracer import only when required (#216)
Rohan138 Oct 3, 2024
4cb422f
Improve profiling setup and documentation, sync benchmarks with main …
AdrianAbeyta Oct 3, 2024
4075b35
Installing the requirements before invoking setup.py since it now imp…
gshtras Oct 3, 2024
2550f14
llama3.2 + cross attn test (#220)
maleksan85 Oct 4, 2024
1992aa8
Factor out common weight loading code
DarkLight1337 Oct 8, 2024
e81645d
Fix EAGLE model loading
DarkLight1337 Oct 8, 2024
4ef043b
Improve efficiency
DarkLight1337 Oct 8, 2024
e723680
Rename
DarkLight1337 Oct 8, 2024
c60e921
Update LLaVA-NeXT-Video
DarkLight1337 Oct 8, 2024
89bde53
Optimize CAR for ROCm (#225)
iotamudelta Oct 8, 2024
9f12890
Automatic loading and save memory
DarkLight1337 Oct 8, 2024
10b5b0e
Rename
DarkLight1337 Oct 8, 2024
ce08df5
Update docstring
DarkLight1337 Oct 8, 2024
df687ac
Simplify
DarkLight1337 Oct 8, 2024
98bf417
Cleanup
DarkLight1337 Oct 8, 2024
decc7a4
Fully enable recursive loading
DarkLight1337 Oct 8, 2024
e59201a
Clarify
DarkLight1337 Oct 8, 2024
b51fe69
Custom PA perf improvements (#222)
sanyalington Oct 8, 2024
f538ab9
Fix incorrect semantics
DarkLight1337 Oct 8, 2024
f077865
Move function
DarkLight1337 Oct 8, 2024
56e4a33
Update error message
DarkLight1337 Oct 8, 2024
85c63c8
Fix Ultravox loading
DarkLight1337 Oct 8, 2024
42a3253
spacing
DarkLight1337 Oct 8, 2024
b21ccdf
Merge remote-tracking branch 'upstream/main'
gshtras Oct 8, 2024
e5a7def
Merge remote-tracking branch 'upstream/main' into main
gshtras Oct 8, 2024
3e72cae
Merge remote-tracking branch 'upstream/fix-weight-loading' into main
gshtras Oct 8, 2024
674b2a5
Merge remote-tracking branch 'origin/main' into upstream_merge_24_10_08
gshtras Oct 8, 2024
390efcb
Fix server
gshtras Oct 8, 2024
8fa419f
Merge remote-tracking branch 'upstream/main' into upstream_merge_24_1…
gshtras Oct 8, 2024
a466f09
Upstream merge 24 10 08 (#226)
gshtras Oct 9, 2024
968345a
customPA write fp8 small ctx fix; enable customPA write fp8 by defaul…
sanyalington Oct 9, 2024
1ec8aaf
Added sccache timeout for vllm build (#230)
maleksan85 Oct 11, 2024
0e0e968
Add fp8 for dbrx (#231)
charlifu Oct 14, 2024
35e2c54
Update Buildkite env variable (#232)
dhonnappa-amd Oct 14, 2024
82cfa5a
cuda graph + num-scheduler-steps bug fix (#236)
seungrokj Oct 16, 2024
1658370
[Model] [BUG] Fix code path logic to load mllama model (#234)
tjtanaa Oct 16, 2024
6e79dcf
Merge remote-tracking branch 'origin/main' into upstream_merge_24_10_21
gshtras Oct 21, 2024
b10dad1
Merge remote-tracking branch 'upstream/main' into upstream_merge_24_1…
gshtras Oct 21, 2024
634d9b0
yapf
gshtras Oct 21, 2024
e0b6bb4
prefix-enabled FA perf issue (#239)
seungrokj Oct 22, 2024
af76c9d
Merge branch 'main' into upstream_merge_24_10_21
gshtras Oct 22, 2024
1eefd1e
Custom PA Partition size 256 to improve performance (#238)
sanyalington Oct 22, 2024
a594c0c
Merge branch 'main' into upstream_merge_24_10_21
gshtras Oct 22, 2024
16cedce
[Build/CI] Minor changes to fix internal CI process. (#235)
Alexei-V-Ivanov-AMD Oct 22, 2024
87e3970
Merge branch 'main' into upstream_merge_24_10_21
gshtras Oct 22, 2024
69d5e1d
[BUGFIX] Restored handling of ROCM FA output as before adaptation of …
maleksan85 Oct 23, 2024
be448fb
Merge branch 'main' into upstream_merge_24_10_21
gshtras Oct 23, 2024
2a3f461
Merge pull request #240 from ROCm/upstream_merge_24_10_21
gshtras Oct 23, 2024
46aa3d2
Using the correct datatype on prefix prefill for fp8 kv cache (#242)
gshtras Oct 23, 2024
ca5e5d3
Update CMakeLists.txt (#244)
gshtras Oct 24, 2024
842ea55
update block manager (#243)
saienduri Oct 24, 2024
c9fc160
[Bugfix][Kernel][Misc] Basic support for SmoothQuant, symmetric case …
rasmith Oct 24, 2024
4bba092
Add fp8 support for llama model family on Navi4x (#245)
qli88 Oct 25, 2024
b20aa29
Merge remote-tracking branch 'upstream/main'
gshtras Oct 28, 2024
c0eb092
Fix for dynamic quantization of the vision part of llama 3.2; Fix for…
gshtras Oct 28, 2024
2454f4a
Fix support for non quantized visual layers in otherwise quantized ml…
gshtras Oct 29, 2024
5974cc3
Custom all reduce fix mi250 (#247)
omirosh Oct 29, 2024
a23a23c
Reorganize imports; Restrict additional supported tensors in _scaled_…
gshtras Oct 29, 2024
b0a8c5d
Merge remote-tracking branch 'upstream/main' into upstream_merge_24_1…
gshtras Oct 29, 2024
ab3f100
Merge remote-tracking branch 'origin/partially_quantized_mllama_fix' …
gshtras Oct 29, 2024
64f51a5
Merge remote-tracking branch 'origin/main' into upstream_merge_24_10_28
gshtras Oct 29, 2024
cfd7388
fix is_hip
gshtras Oct 29, 2024
7aa6982
Merge pull request #248 from ROCm/upstream_merge_24_10_28
gshtras Oct 29, 2024
1ef171e
fp8 moe configs. Mixtral-8x(7B,22B) TP=1,2,4,8 (#250)
divakar-amd Oct 30, 2024
751c8fc
Sccache removal from Dockerfile.rocm (#253)
omirosh Oct 31, 2024
e19c415
Update Dockerfile.rocm (#254)
shajrawi Oct 31, 2024
6a844ee
Using the correct type hints (#256)
gshtras Oct 31, 2024
353cfeb
Revert "Update Dockerfile.rocm (#254)" (#257)
gshtras Oct 31, 2024
733f79a
Creating ROCm whl upon release (#259)
gshtras Nov 1, 2024
ba1844a
Using a ROCm6.2.2 base image with torch version that doesn't support …
gshtras Nov 4, 2024
65405fd
Merge remote-tracking branch 'upstream/main'
gshtras Nov 4, 2024
36fff16
Merge pull request #262 from ROCm/upstream_merge_24_11_04
gshtras Nov 5, 2024
c091eaf
Add gfx1201 to supported ARCH list (#264)
qli88 Nov 5, 2024
1c740db
Modifying the sampler to allow FORCED type of sampling. (#265)
Alexei-V-Ivanov-AMD Nov 5, 2024
4868a43
Eliminated -Wswitch-bool warning and a leftover incorrect import (#266)
gshtras Nov 6, 2024
aca6d2e
Navi correctness fix 1 to 300 count (#263)
maleksan85 Nov 7, 2024
2eabfbc
forgotten local commit (#267)
maleksan85 Nov 7, 2024
72ffb94
Update profiling benchmarks to take in new EngArgs method
Oct 31, 2024
15d17c5
Rpd build arg (#269)
gshtras Nov 8, 2024
1dcc5df
Making flash attention build happen after we built and installed the …
gshtras Nov 8, 2024
8f3bf8b
Update P3L.py (#271)
gshtras Nov 11, 2024
c8c6c7e
Merge remote-tracking branch 'upstream/main'
gshtras Nov 11, 2024
409a439
Merge remote-tracking branch 'origin/main' into upstream_merge_24_11_11
gshtras Nov 11, 2024
c0f2e87
Formatting
gshtras Nov 11, 2024
8de3a62
Merge pull request #272 from ROCm/upstream_merge_24_11_11
gshtras Nov 12, 2024
9a46e97
corrected types for strides in triton FA (#274)
maleksan85 Nov 13, 2024
a9caec4
Running linter actions on develop branch (#275)
gshtras Nov 13, 2024
3afc735
rocm support for moe tuning script (#251)
divakar-amd Nov 13, 2024
efb0432
corrected types for strides in triton FA (#274) (#276)
maleksan85 Nov 13, 2024
04aa1a7
mixtral8x22B moe configs mi300 TP=1,2,4,8 (#277)
divakar-amd Nov 14, 2024
5362727
Improve the heuristic logic for fp8 weight padding (#279)
charlifu Nov 14, 2024
48726bf
Gradlib torch extension cmake (#282)
gshtras Nov 15, 2024
9540837
Merge remote-tracking branch 'upstream/main' into develop
gshtras Nov 18, 2024
bd1cc77
Vision model ROCm fix
gshtras Nov 18, 2024
4a185d8
Update yapf.yml
gshtras Nov 19, 2024
8f7daff
Merge branch 'develop' into upstream_merge_24_11_18
gshtras Nov 19, 2024
62334b5
Merge pull request #286 from ROCm/upstream_merge_24_11_18
gshtras Nov 19, 2024
15c78e7
[OPT] improve rms_norm kernel (#258)
kkHuang-amd Nov 20, 2024
a8a8fe9
Cuda compile fix2 (#284)
hliuca Nov 20, 2024
8647e89
use CK FA for glm-4v on navi3 (#281)
jfactory07 Nov 20, 2024
2c60adc
Disable custom all-reduce on two Navi GPUs (#287)
hyoon1 Nov 20, 2024
3a4f14a
Base docker image (#290)
gshtras Nov 22, 2024
01deb43
Added --output-json parameter in the P3l script. Using arg_utils to s…
gshtras Nov 22, 2024
fb15a9e
devdocker README from https://github.com/powderluv/vllm-docs (#292)
gshtras Nov 25, 2024
2302ad6
Update README.md
gshtras Nov 25, 2024
529cefe
Run clang-format on develop (#296)
gshtras Nov 27, 2024
df43d6e
Merge remote-tracking branch 'upstream/main' into upstream_merge_24_1…
gshtras Nov 27, 2024
5a35b87
Merge remote-tracking branch 'origin/develop' into upstream_merge_24_…
gshtras Nov 27, 2024
0537c66
Merge remote-tracking branch 'upstream/main' into upstream_merge_24_1…
gshtras Nov 27, 2024
6cf8eb4
Fix correctness regression (from PR#258) in Llama-3.2-90B-Vision-Inst…
kkHuang-amd Nov 28, 2024
0191582
Merge remote-tracking branch 'origin/develop' into upstream_merge_24_…
gshtras Dec 2, 2024
1e93ebe
Merge remote-tracking branch 'upstream/main' into upstream_merge_24_1…
gshtras Dec 2, 2024
a8b5334
Fix for the current state of platform specific quantizations
gshtras Dec 2, 2024
361c63d
Merge pull request #297 from ROCm/upstream_merge_24_11_25
gshtras Dec 2, 2024
bb14866
Fix type hints for cython (#299)
gshtras Dec 3, 2024
0cee60d
fused_moe configs for MI325X (#300)
JArnoldAMD Dec 3, 2024
18ef0a0
enable softcap and gemma2 (#288)
hliuca Dec 4, 2024
97fd542
[vllm] Add support for FP8 in Triton FA kernel (#301)
ilia-cher Dec 4, 2024
d291770
Update test-template.j2 (#283)
dhonnappa-amd Dec 4, 2024
1a17f0f
re-tune fp8 mixtral8x22B (#304)
divakar-amd Dec 4, 2024
68fdfc2
rm custom moe tune file. Add bash script for tuning reference (#305)
divakar-amd Dec 4, 2024
ccdb5b8
(temp workaround for Triton bug) (#306)
ilia-cher Dec 5, 2024
b414ae9
Always use 64 as the block size of moe_align kernel to avoid lds out …
charlifu Dec 5, 2024
da8f61a
Update Dockerfile.rocm (#307)
saienduri Dec 5, 2024
2b17421
Using ROCm6.3 release image as a base (#308)
gshtras Dec 6, 2024
8663822
Fix kernel cache miss and add RDNA configs (#246)
hyoon1 Dec 6, 2024
44212d7
Update README.md (#309)
t-parry Dec 6, 2024
679a15c
Fix max_seqlens_q/k initialization for Navi GPUs (#310)
hyoon1 Dec 9, 2024
fb82bf1
Merge remote-tracking branch 'origin/develop'
gshtras Dec 9, 2024
22f9066
Setting the value for the scpecilative decoding worker class on rocm …
gshtras Dec 9, 2024
7c61516
Merge remote-tracking branch 'upstream/main' into develop
gshtras Dec 9, 2024
401a541
format
gshtras Dec 9, 2024
c9f5c24
Merge remote-tracking branch 'origin/main' into upstream_merge_24_12_09
gshtras Dec 9, 2024
c324ea8
Merge remote-tracking branch 'upstream/main' into upstream_merge_24_1…
gshtras Dec 9, 2024
dcc3f45
Merge pull request #314 from ROCm/upstream_merge_24_12_09
gshtras Dec 9, 2024
84eeed2
Reverting triton commit to the one which showed a better performance …
gshtras Dec 11, 2024
eb4d191
Navi docker (#316)
gshtras Dec 11, 2024
e8e6ebf
fix GemmTuner import in gradlib (#319)
Rohan138 Dec 11, 2024
2bec3da
Storing the installed commit hashes and customizations in a file (#320)
gshtras Dec 11, 2024
a1aaa74
Option to override PYTORCH_ROCM_ARCH inherited from the base image (#…
gshtras Dec 11, 2024
7efa6e0
Update README.md (#322)
t-parry Dec 12, 2024
405e730
Disable auto enabling chunked prefill on ROCm platform on long contex…
gshtras Dec 12, 2024
14b11f5
Fix logging of the vLLM Config (#11143) (#325)
gshtras Dec 12, 2024
1a8e549
Merge remote-tracking branch 'upstream/main'
gshtras Dec 16, 2024
78440dc
Deprecating sync_openai
gshtras Dec 16, 2024
ddec133
Remove new irrelevant action
gshtras Dec 16, 2024
1f0f4c6
Merge pull request #330 from ROCm/upstream_merge_24_12_16
gshtras Dec 16, 2024
d09f1ce
Fix regression from #246 (#332)
gshtras Dec 16, 2024
d9fed26
Dynamic Scale Factor Calculations for Key/Value Scales With FP8 KV Ca…
micah-wil Dec 17, 2024
27f53a2
Fixed the new condition for fp8 type (#333)
gshtras Dec 18, 2024
fa1ff83
Mllama kv scale fix (#335)
gshtras Dec 18, 2024
399016d
Using the generic base image created by the vllm-ci pipeline (#336)
gshtras Dec 18, 2024
d08b78b
Properly initializing the new field in the attn metadata (#337)
gshtras Dec 18, 2024
1dcd9fe
Ingest FP8 attn scales and use them in ROCm FlashAttention (#338)
mawong-amd Dec 20, 2024
ca4d670
Library versions bump (#343)
gshtras Dec 20, 2024
a264693
Updated fused_moe configs for MI325X with Triton 3.2 (#345)
JArnoldAMD Dec 21, 2024
4773c29
Merge remote-tracking branch 'upstream/main'
gshtras Jan 6, 2025
267c1a1
format
gshtras Jan 6, 2025
2053351
deepseek overflow fix (#349)
Concurrensee Jan 6, 2025
97067c0
Merge branch 'main' into upstream_merge_25_1_6
gshtras Jan 8, 2025
88e020d
Merge pull request #350 from ROCm/upstream_merge_25_1_6
gshtras Jan 8, 2025
c040f0e
Revert nccl changes (#351)
gshtras Jan 8, 2025
3efdd2b
fp8 support (#352)
Concurrensee Jan 9, 2025
ce53f46
Merge remote-tracking branch 'upstream/main'
gshtras Jan 13, 2025
5a51290
Using list
gshtras Jan 13, 2025
079750e
Revert "[misc] improve memory profiling (#11809)"
gshtras Jan 13, 2025
113274a
Multi-lingual P3L (#356)
Alexei-V-Ivanov-AMD Jan 13, 2025
043c93d
Trying to make scales work with compileable attention
gshtras Jan 13, 2025
16f8680
Docs lint
gshtras Jan 14, 2025
eb4abfd
Merge remote-tracking branch 'origin/main' into upstream_merge_25_01_13
gshtras Jan 14, 2025
5976f48
Merge pull request #358 from ROCm/upstream_merge_25_01_13
gshtras Jan 14, 2025
8bd76fb
Enable user marker for vllm profiling (#357)
Lzy17 Jan 16, 2025
c5a9406
Deepseek V3 support (#364)
gshtras Jan 16, 2025
031e6eb
Merge remote-tracking branch 'upstream/main'
gshtras Jan 20, 2025
3e1cadb
Merge pull request #368 from ROCm/upstream_merge_25_01_20
gshtras Jan 20, 2025
faa1815
Using ROCm6.3.1 base docker and building hipblas-common (#366)
gshtras Jan 20, 2025
78d7d30
Update pre-commit.yml (#374)
gshtras Jan 21, 2025
b5839a1
Skip tokenize/detokenize when it is disabled by arg --skip-tokenizer-…
maleksan85 Jan 22, 2025
a600e9f
FP8 FA fixes (#381)
ilia-cher Jan 23, 2025
5f9b40b
Returning the use of the proper stream in allreduce (#382)
gshtras Jan 23, 2025
84f5d47
Using pytorch commit past the point when rowwise PR (https://github.c…
gshtras Jan 23, 2025
8e87b08
Applying scales rename to fp8 config (#387)
gshtras Jan 24, 2025
28b1ad9
Dev-docker Documentation Updates (#378)
JArnoldAMD Jan 25, 2025
8e6d987
Merge remote-tracking branch 'upstream/main'
gshtras Jan 27, 2025
6b2147f
Support FP8 FA from Quark format (#388)
BowenBao Jan 28, 2025
a892ecc
Merge remote-tracking branch 'origin/main' into upstream_merge_25_01_27
gshtras Jan 28, 2025
c8b8654
Direct call on ROCm
gshtras Jan 28, 2025
b2c3b22
Merge pull request #391 from ROCm/upstream_merge_25_01_27
gshtras Jan 28, 2025
7a292f9
20250127 docs update (#392)
arakowsk-amd Jan 29, 2025
273c949
Faster Custom Paged Attention kernels (#372)
sanyalington Jan 30, 2025
22141e7
Using a more precise profiling on ROCm to properly account for weight…
gshtras Jan 30, 2025
6852819
Update Dockerfile.rocm
gshtras Jan 30, 2025
339ba27
Merge remote-tracking branch 'upstream/main' into upstream_merge_25_0…
gshtras Jan 31, 2025
d47b834
Remove redundant code paths
gshtras Jan 31, 2025
2fa8a9d
Fix MLA and logic for using triton scaled_mm on ROCm as blockwise FP8…
mawong-amd Feb 1, 2025
3523ce5
use MLA on rocm
hongxiayang Feb 2, 2025
3930fdd
pre-commit format
hongxiayang Feb 2, 2025
4e7e709
Aiter readme (#400)
gshtras Feb 3, 2025
14a02be
Merge remote-tracking branch 'upstream/main' into upstream_merge_25_0…
gshtras Feb 3, 2025
0e24b85
Merge remote-tracking branch 'upstream/main' into upstream_merge_25_0…
gshtras Feb 3, 2025
92b42cd
Merge remote-tracking branch 'hongxia/enable_deepseek' into upstream_…
gshtras Feb 3, 2025
fdb06c3
fix None dict (#402)
hliuca Feb 3, 2025
b59d8c3
Merge branch 'main' into upstream_merge_25_02_03
gshtras Feb 3, 2025
8dbc899
New linters
gshtras Feb 3, 2025
76b8163
Merge branch 'upstream_merge_25_02_03' of github.com:ROCm/vllm into u…
gshtras Feb 3, 2025
c887bc9
Custom params for mla attention backend
gshtras Feb 3, 2025
b43c8d1
Merge pull request #403 from ROCm/upstream_merge_25_02_03
gshtras Feb 3, 2025
d586c39
Mbatch p3l (#401)
Alexei-V-Ivanov-AMD Feb 4, 2025
ea787b0
Test build to check processing by different K8 queues.
Alexei-V-Ivanov-AMD Feb 4, 2025
01dfdda
Testing.
Alexei-V-Ivanov-AMD Feb 5, 2025
7f80bf8
Copying over the tests directory to enable CI testing.
Alexei-V-Ivanov-AMD Feb 5, 2025
14aaf35
Comparing with MI250 in the "mi250_8xGPU" queue.
Alexei-V-Ivanov-AMD Feb 5, 2025
a106489
Building with "test" as a --target
Alexei-V-Ivanov-AMD Feb 5, 2025
6acfc3a
Fixing working directory property.
Alexei-V-Ivanov-AMD Feb 5, 2025
172e0e8
Dummy alternation to confirm trouble with simultaneous test execution.
Alexei-V-Ivanov-AMD Feb 5, 2025
114e750
Dummy alternation to trigger a re-build and re-test.
Alexei-V-Ivanov-AMD Feb 6, 2025
2bd2caf
queue test
dhonnappa-amd Feb 28, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions .buildkite/run-amd-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ set -o pipefail
echo "--- Confirming Clean Initial State"
while true; do
sleep 3
if grep -q clean /opt/amdgpu/etc/gpu_state; then
if grep -q clean ${BUILDKITE_AGENT_META_DATA_RESET_TARGET}; then
echo "GPUs state is \"clean\""
break
fi
Expand Down Expand Up @@ -46,11 +46,11 @@ cleanup_docker

echo "--- Resetting GPUs"

echo "reset" > /opt/amdgpu/etc/gpu_state
echo "reset" > ${BUILDKITE_AGENT_META_DATA_RESET_TARGET}

while true; do
sleep 3
if grep -q clean /opt/amdgpu/etc/gpu_state; then
if grep -q clean ${BUILDKITE_AGENT_META_DATA_RESET_TARGET}; then
echo "GPUs state is \"clean\""
break
fi
Expand Down Expand Up @@ -141,8 +141,9 @@ if [[ $commands == *"--shard-id="* ]]; then
fi
done
else
echo "Render devices: $BUILDKITE_AGENT_META_DATA_RENDER_DEVICES"
docker run \
--device /dev/kfd --device /dev/dri \
--device /dev/kfd $BUILDKITE_AGENT_META_DATA_RENDER_DEVICES \
--network host \
--shm-size=16gb \
--rm \
Expand Down
19 changes: 19 additions & 0 deletions .buildkite/test-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,9 @@ steps:
- VLLM_ATTENTION_BACKEND=FLASH_ATTN pytest -v -s basic_correctness/test_chunked_prefill.py

- label: Core Test # 10min
working_dir: "/vllm-workspace/tests"
mirror_hardwares: [amd]
amd_gpus: 4 # Just for the sake of queue testing
fast_check: true
source_file_dependencies:
- vllm/core
Expand All @@ -105,6 +107,7 @@ steps:
working_dir: "/vllm-workspace/tests"
fast_check: true
mirror_hardwares: [amd]
amd_gpus: 1 # Just for the sake of queue testing
source_file_dependencies:
- vllm/
commands:
Expand Down Expand Up @@ -158,6 +161,7 @@ steps:

- label: Regression Test # 5min
mirror_hardwares: [amd]
amd_gpus: 1
source_file_dependencies:
- vllm/
- tests/test_regression
Expand All @@ -168,6 +172,7 @@ steps:

- label: Engine Test # 10min
mirror_hardwares: [amd]
amd_gpus: 1
source_file_dependencies:
- vllm/
- tests/engine
Expand All @@ -176,6 +181,7 @@ steps:
- pytest -v -s engine test_sequence.py test_config.py test_logger.py
# OOM in the CI unless we run this separately
- pytest -v -s tokenization
working_dir: "/vllm-workspace/tests" # optional

- label: V1 Test
#mirror_hardwares: [amd]
Expand Down Expand Up @@ -217,7 +223,9 @@ steps:
- python3 offline_inference/profiling.py --model facebook/opt-125m run_num_steps --num-steps 2

- label: Prefix Caching Test # 9min
working_dir: "/vllm-workspace/tests"
mirror_hardwares: [amd]
amd_gpus: 1
source_file_dependencies:
- vllm/
- tests/prefix_caching
Expand All @@ -235,7 +243,9 @@ steps:
- VLLM_USE_FLASHINFER_SAMPLER=1 pytest -v -s samplers

- label: LogitsProcessor Test # 5min
working_dir: "/vllm-workspace/tests"
mirror_hardwares: [amd]
amd_gpus: 1
source_file_dependencies:
- vllm/model_executor/layers
- vllm/model_executor/guided_decoding
Expand All @@ -256,7 +266,9 @@ steps:
- pytest -v -s spec_decode/e2e/test_eagle_correctness.py

- label: LoRA Test %N # 15min each
working_dir: "/vllm-workspace/tests"
mirror_hardwares: [amd]
amd_gpus: 8
source_file_dependencies:
- vllm/lora
- tests/lora
Expand All @@ -282,7 +294,9 @@ steps:
- pytest -v -s compile/test_full_graph.py

- label: Kernels Test %N # 1h each
working_dir: "/vllm-workspace/tests"
mirror_hardwares: [amd]
amd_gpus: 8
source_file_dependencies:
- csrc/
- vllm/attention
Expand All @@ -292,8 +306,10 @@ steps:
parallelism: 4

- label: Tensorizer Test # 11min
working_dir: "/vllm-workspace/tests"
mirror_hardwares: [amd]
soft_fail: true
amd_gpus: 1
source_file_dependencies:
- vllm/model_executor/model_loader
- tests/tensorizer_loader
Expand All @@ -305,6 +321,7 @@ steps:
- label: Benchmarks # 9min
working_dir: "/vllm-workspace/.buildkite"
mirror_hardwares: [amd]
amd_gpus: 1
source_file_dependencies:
- benchmarks/
commands:
Expand Down Expand Up @@ -334,8 +351,10 @@ steps:
- pytest -v -s encoder_decoder

- label: OpenAI-Compatible Tool Use # 20 min
working_dir: "/vllm-workspace/tests"
fast_check: false
mirror_hardwares: [ amd ]
amd_gpus: 1
source_file_dependencies:
- vllm/
- tests/tool_use
Expand Down
46 changes: 46 additions & 0 deletions .buildkite/test-template.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
{% set docker_image = "public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT" %}
{% set docker_image_amd = "rocm/vllm-ci:$BUILDKITE_COMMIT" %}
{% set default_working_dir = "vllm/tests" %}
{% set hf_home = "/root/.cache/huggingface" %}

steps:
- label: ":docker: build image"
depends_on: ~
commands:
- "docker build --build-arg max_jobs=16 --tag {{ docker_image_amd }} -f Dockerfile.rocm --target test --progress plain ."
- "docker push {{ docker_image_amd }}"
key: "amd-build"
env:
DOCKER_BUILDKIT: "1"
retry:
automatic:
- exit_status: -1 # Agent was lost
limit: 5
- exit_status: -10 # Agent was lost
limit: 5
agents:
queue: amd-cpu

{% for step in steps %}
{% if step.mirror_hardwares and "amd" in step.mirror_hardwares %}
- label: "AMD: {{ step.label }}"
depends_on:
- "amd-build"
agents:
{% if step.amd_gpus and step.amd_gpus==8%}
queue: amd_gpu_8
{% elif step.amd_gpus and step.amd_gpus==4%}
queue: amd_gpu_4
{% elif step.amd_gpus and step.amd_gpus==2%}
queue: amd_gpu_4
{% else%}
queue: amd_gpu_1
{% endif%}
commands:
- bash .buildkite/run-amd-test.sh "cd {{ (step.working_dir or default_working_dir) | safe }} ; {{ step.command or (step.commands | join(" && ")) | safe }}"
env:
DOCKER_BUILDKIT: "1"
priority: 100
soft_fail: true
{% endif %}
{% endfor %}
82 changes: 0 additions & 82 deletions .github/workflows/lint-and-deploy.yaml

This file was deleted.

105 changes: 38 additions & 67 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@ jobs:
release:
# Retrieve tag and create release
name: Create Release
runs-on: ubuntu-latest
runs-on: self-hosted
container:
image: rocm/pytorch:rocm6.2_ubuntu20.04_py3.9_pytorch_release_2.3.0
outputs:
upload_url: ${{ steps.create_release.outputs.upload_url }}
steps:
Expand All @@ -39,73 +41,42 @@ jobs:
const script = require('.github/workflows/scripts/create_release.js')
await script(github, context, core)

# NOTE(simon): No longer build wheel using Github Actions. See buildkite's release workflow.
# wheel:
# name: Build Wheel
# runs-on: ${{ matrix.os }}
# needs: release
wheel:
name: Build Wheel
runs-on: self-hosted
container:
image: rocm/pytorch:rocm6.2_ubuntu20.04_py3.9_pytorch_release_2.3.0
needs: release

# strategy:
# fail-fast: false
# matrix:
# os: ['ubuntu-20.04']
# python-version: ['3.9', '3.10', '3.11', '3.12']
# pytorch-version: ['2.4.0'] # Must be the most recent version that meets requirements-cuda.txt.
# cuda-version: ['11.8', '12.1']
strategy:
fail-fast: false

# steps:
# - name: Checkout
# uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

# - name: Setup ccache
# uses: hendrikmuhs/ccache-action@ed74d11c0b343532753ecead8a951bb09bb34bc9 # v1.2.14
# with:
# create-symlink: true
# key: ${{ github.job }}-${{ matrix.python-version }}-${{ matrix.cuda-version }}

# - name: Set up Linux Env
# if: ${{ runner.os == 'Linux' }}
# run: |
# bash -x .github/workflows/scripts/env.sh

# - name: Set up Python
# uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0
# with:
# python-version: ${{ matrix.python-version }}

# - name: Install CUDA ${{ matrix.cuda-version }}
# run: |
# bash -x .github/workflows/scripts/cuda-install.sh ${{ matrix.cuda-version }} ${{ matrix.os }}

# - name: Install PyTorch ${{ matrix.pytorch-version }} with CUDA ${{ matrix.cuda-version }}
# run: |
# bash -x .github/workflows/scripts/pytorch-install.sh ${{ matrix.python-version }} ${{ matrix.pytorch-version }} ${{ matrix.cuda-version }}

# - name: Build wheel
# shell: bash
# env:
# CMAKE_BUILD_TYPE: Release # do not compile with debug symbol to reduce wheel size
# run: |
# bash -x .github/workflows/scripts/build.sh ${{ matrix.python-version }} ${{ matrix.cuda-version }}
# wheel_name=$(find dist -name "*whl" -print0 | xargs -0 -n 1 basename)
# asset_name=${wheel_name//"linux"/"manylinux1"}
# echo "wheel_name=${wheel_name}" >> "$GITHUB_ENV"
# echo "asset_name=${asset_name}" >> "$GITHUB_ENV"
steps:
- name: Prepare
run: |
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2
pip3 install -U triton

# - name: Upload Release Asset
# uses: actions/upload-release-asset@e8f9f06c4b078e705bd2ea027f0926603fc9b4d5 # v1.0.2
# env:
# GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# with:
# upload_url: ${{ needs.release.outputs.upload_url }}
# asset_path: ./dist/${{ env.wheel_name }}
# asset_name: ${{ env.asset_name }}
# asset_content_type: application/*
- name: Checkout
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

# (Danielkinz): This last step will publish the .whl to pypi. Warning: untested
# - name: Publish package
# uses: pypa/gh-action-pypi-publish@release/v1.8
# with:
# repository-url: https://test.pypi.org/legacy/
# password: ${{ secrets.PYPI_API_TOKEN }}
# skip-existing: true
- name: Build wheel
shell: bash
env:
CMAKE_BUILD_TYPE: Release # do not compile with debug symbol to reduce wheel size
run: |
bash -x .github/workflows/scripts/build.sh
wheel_name=$(find dist -name "*whl" -print0 | xargs -0 -n 1 basename)
asset_name=${wheel_name//"linux"/"manylinux1"}
echo "wheel_name=${wheel_name}" >> "$GITHUB_ENV"
echo "asset_name=${asset_name}" >> "$GITHUB_ENV"

- name: Upload vllm Release Asset
uses: actions/upload-release-asset@e8f9f06c4b078e705bd2ea027f0926603fc9b4d5 # v1.0.2
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
upload_url: ${{ needs.release.outputs.upload_url }}
asset_path: ./dist/${{ env.wheel_name }}
asset_name: ${{ env.asset_name }}
asset_content_type: application/*
Loading