Add the support of AMD MI300X/MI325X/MI355X of Ernie 4.5-VL recipe#228
Add the support of AMD MI300X/MI325X/MI355X of Ernie 4.5-VL recipe#228ChangLiu0709 wants to merge 4 commits intovllm-project:mainfrom
Conversation
ChangLiu0709
commented
Jan 29, 2026
- Update Ernie 4.5-VL docs for AMD GPUs
- Add VLLM_ROCM_USE_AITER=1 for enabling AITER backend
Summary of ChangesHello @ChangLiu0709, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the ERNIE-4.5-VL recipe documentation by integrating detailed instructions for deploying and benchmarking the model on AMD's latest MI300X, MI325X, and MI355X GPUs. The changes provide users with a clear guide to leverage AMD hardware, including specific Docker commands and vLLM server configurations, thereby broadening the accessibility and utility of the ERNIE-4.5-VL model across different hardware platforms. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request successfully adds comprehensive documentation for running ERNIE-4.5-VL on AMD MI300X/MI325X/MI355X GPUs. This includes detailed steps for Docker environment setup, Hugging Face login, vLLM server startup, and benchmarking. The capitalization of 'Ernie' to 'ERNIE' has also been corrected for consistency.
Ernie/Ernie4.5-VL.md
Outdated
| --dataset-name random \ | ||
| --random-input-len 8000 \ | ||
| --random-output-len 1000 \ | ||
| --request-rate 10000 \ |
There was a problem hiding this comment.
The --request-rate for the AMD benchmark is set to 10000, which is significantly higher than the 10 used in the general benchmarking section (line 62). This large discrepancy could lead to the benchmark failing or producing unrealistic results if the hardware cannot sustain such a high rate. Please verify if 10000 is the intended value or if it should be adjusted to a more realistic number, possibly aligning with the other benchmark examples or providing context for this high rate.
| --request-rate 10000 \ | |
| --request-rate 10 \ |
Ernie/Ernie4.5-VL.md
Outdated
| Launch the ROCm vLLM docker: | ||
|
|
||
| ```shell | ||
| docker run -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $(pwd):/work -e SHELL=/bin/bash --name Ernie-4.5-VL vllm/vllm-openai-rocm:v0.14.1 |
There was a problem hiding this comment.
The docker run command uses --privileged and --cap-add=CAP_SYS_ADMIN. While these might be necessary for ROCm environments, they grant extensive permissions to the container. It's generally recommended to use the most restrictive permissions possible. Consider adding a note about the security implications or exploring if a more granular set of capabilities can achieve the same functionality.
Ernie/Ernie4.5-VL.md
Outdated
| --tensor-parallel-size 4 \ | ||
| --gpu-memory-utilization 0.9 \ | ||
| --disable-log-requests \ | ||
| --no-enable-prefix-caching \ |
There was a problem hiding this comment.
The --no-enable-prefix-caching flag is used in the AMD GPU server startup command, but it's not present in the general benchmarking section's server command (lines 19-27). This inconsistency might lead to different performance characteristics between the two setups. If this flag is crucial for AMD GPUs or specific to this benchmark, it should be explained, or its absence in other sections should be justified for clarity.
86a6790 to
043b8a7
Compare
|
Hi @tjtanaa I have updated the recipe with uv pip installation of vLLM. Please have a check : ) |
Ernie/Ernie4.5-VL.md
Outdated
| ### Step 4: Run Benchmark | ||
| Open a new terminal and run the following command to execute the benchmark script inside the container. | ||
| ```shell | ||
| docker exec -it Ernie-4.5-VL vllm bench serve \ |
There was a problem hiding this comment.
We prioritize the pip install approach.
So, we should also include the command where there is no docker exec -it Ernie-4.5-VL,
Ernie/Ernie4.5-VL.md
Outdated
| Pull the latest vllm docker: | ||
|
|
||
| ```shell | ||
| docker pull vllm/vllm-openai-rocm:v0.15.1 |
There was a problem hiding this comment.
let's just use docker pull vllm/vllm-openai-rocm:latest so that we don't need to keep on updating the doc.
Ernie/Ernie4.5-VL.md
Outdated
| Launch the ROCm vLLM docker: | ||
|
|
||
| ```shell | ||
| docker run -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $(pwd):/work -e SHELL=/bin/bash --name Ernie-4.5-VL vllm/vllm-openai-rocm:v0.15.1 |
There was a problem hiding this comment.
let's just use vllm/vllm-openai-rocm:latest so that we don't need to keep on updating the doc.
Ernie/Ernie4.5-VL.md
Outdated
| Launch the ROCm vLLM docker: | ||
|
|
||
| ```shell | ||
| docker run -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $(pwd):/work -e SHELL=/bin/bash --name Ernie-4.5-VL vllm/vllm-openai-rocm:v0.15.1 |
There was a problem hiding this comment.
Let's make it multiline command and be consistent with existing format.
4082bc9 to
48c4650
Compare
|
Hi @tjtanaa thanks for all the feedback! I have just modified the content accordingly. Please have a re-check : )) |
c24d0ad to
80c4780
Compare
|
Hi @tjtanaa just removed unneeded docker command and please have a check. |
03e8f32 to
77e7e8e
Compare
|
Wondering if this can be merged @tjtanaa ? |
|
|
||
| ## Installing vLLM | ||
| Ernie4.5-VL support was recently added to vLLM main branch and is not yet available in any official release: | ||
| ERNIE-4.5-VL support was recently added to vLLM main branch and is not yet available in any official release: |
There was a problem hiding this comment.
Let's add a subheader called ### CUDA and ### AMD ROCm: MI300x/MI325x/MI355x
Signed-off-by: seungrokj <seungrok.jung@amd.com> Signed-off-by: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>
Signed-off-by: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>
Signed-off-by: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>
Signed-off-by: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>
6a8f37d to
437d5bf
Compare
|
Hi @tjtanaa please have a check of this PR. I have updated the content according to your previous feedback. |