Add the support of AMD MI300X/MI325X/MI355X of Ernie 4.5 recipe#227
Add the support of AMD MI300X/MI325X/MI355X of Ernie 4.5 recipe#227ChangLiu0709 wants to merge 5 commits intovllm-project:mainfrom
Conversation
ChangLiu0709
commented
Jan 29, 2026
- Update Ernie 4.5 docs for AMD GPUs
- Add VLLM_ROCM_USE_AITER=1 for enabling AITER backend
Summary of ChangesHello @ChangLiu0709, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the hardware compatibility of the Ernie 4.5 recipe by integrating support for AMD's MI300X, MI325X, and MI355X GPUs. The changes primarily involve updating the existing documentation to guide users through the necessary Docker environment setup, vLLM server configuration, and benchmarking procedures tailored for these AMD platforms, ensuring optimal performance through specific environment variable settings. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request adds documentation for running Ernie 4.5 models on AMD GPUs. The changes are clear and provide step-by-step instructions. I've added a couple of suggestions to improve the readability of a long command and to fix a potential typo in a model identifier that could cause an error.
Ernie/Ernie4.5.md
Outdated
|
|
||
| Launch the ROCm vLLM docker: | ||
| ```shell | ||
| docker run -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $(pwd):/work -e SHELL=/bin/bash --name Ernie-4.5 vllm/vllm-openai-rocm:v0.14.1 |
There was a problem hiding this comment.
For better readability, consider splitting this long docker run command into multiple lines using backslashes. This makes the command easier to read and understand, similar to how other multi-line commands are formatted in this document.
| docker run -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $(pwd):/work -e SHELL=/bin/bash --name Ernie-4.5 vllm/vllm-openai-rocm:v0.14.1 | |
| docker run -it \ | |
| --ipc=host \ | |
| --network=host \ | |
| --privileged \ | |
| --cap-add=CAP_SYS_ADMIN \ | |
| --device=/dev/kfd \ | |
| --device=/dev/dri \ | |
| --device=/dev/mem \ | |
| --group-add video \ | |
| --cap-add=SYS_PTRACE \ | |
| --security-opt seccomp=unconfined \ | |
| -v $(pwd):/work \ | |
| -e SHELL=/bin/bash \ | |
| --name Ernie-4.5 \ | |
| vllm/vllm-openai-rocm:v0.14.1 |
Ernie/Ernie4.5.md
Outdated
| ```shell | ||
| VLLM_ROCM_USE_AITER=1 \ | ||
| SAFETENSORS_FAST_GPU=1 \ | ||
| vllm serve baidu/ERNIE-4.5-21B-A3B-PT/ \ |
There was a problem hiding this comment.
The model identifier baidu/ERNIE-4.5-21B-A3B-PT/ includes a trailing slash. This is likely a typo and could cause the command to fail if vllm interprets it as a local path instead of a Hugging Face model ID. For consistency with the benchmark command and standard practice, the trailing slash should be removed.
| vllm serve baidu/ERNIE-4.5-21B-A3B-PT/ \ | |
| vllm serve baidu/ERNIE-4.5-21B-A3B-PT \ |
6034b64 to
92eb6fe
Compare
|
Hi @tjtanaa I have updated the recipe with uv pip installation of vLLM. Please have a check : ) |
31dddcc to
fc809d5
Compare
|
Hi @tjtanaa updated the content according to the same requirements in the Glyph doc. Please have a check. |
0172ef0 to
fa7698e
Compare
|
Hi @tjtanaa I just removed the docker bench command and please have a check! |
8ddf4c0 to
7c747d2
Compare
|
Hi @tjtanaa wondering if this can be merged : )) |
Ernie/Ernie4.5.md
Outdated
|
|
||
| ## Installing vLLM (For AMD ROCm: MI300x/MI325x/MI355x) | ||
| ```bash | ||
| uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/0.14.1/rocm700 |
There was a problem hiding this comment.
we already have the latest vllm version v0.16.0
https://wheels.vllm.ai/rocm/0.16.0/rocm700
| --speculative-config '{"method": "ernie_mtp","model": "baidu/ERNIE-4.5-300B-A47B-PT","num_speculative_tokens": 1}' | ||
| ``` | ||
|
|
||
|
|
There was a problem hiding this comment.
can you remove this unnecessary line change?
|
|
||
| For benchmarking, only the first `vllm bench serve` after service startup to ensure it is not affected by prefix cache | ||
|
|
||
|
|
There was a problem hiding this comment.
can you remove this unnecessary line change?
|
|
||
| ### Expected Output | ||
|
|
||
|
|
There was a problem hiding this comment.
can you remove this unnecessary line change?
| P99 ITL (ms): 20.69 | ||
| ================================================== | ||
| ``` | ||
|
|
There was a problem hiding this comment.
can you remove this unnecessary line change?
Ernie/Ernie4.5.md
Outdated
| --tensor-parallel-size 4 \ | ||
| --gpu-memory-utilization 0.9 \ | ||
| --disable-log-requests \ | ||
| --no-enable-prefix-caching \ |
There was a problem hiding this comment.
as discussed, vllm recipes are for application users. We should not disable prefix caching in actual deployment.
|
We try to get this PR merged first before reviewing the other PRs. |
Signed-off-by: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>
Signed-off-by: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>
Signed-off-by: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>
Signed-off-by: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>
…sion and the ROCm command Signed-off-by: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>
|
Hi @tjtanaa just updated the content according to your feedback. Please have a check : )) |
|
Hi @tjtanaa just a kind reminder. Wondering if we can get this PR merged : )) |