Skip to content

Add the support of AMD MI300X/MI325X/MI355X of Ernie 4.5-VL recipe#228

Open
ChangLiu0709 wants to merge 4 commits intovllm-project:mainfrom
ChangLiu0709:Ernie4.5-VL
Open

Add the support of AMD MI300X/MI325X/MI355X of Ernie 4.5-VL recipe#228
ChangLiu0709 wants to merge 4 commits intovllm-project:mainfrom
ChangLiu0709:Ernie4.5-VL

Conversation

@ChangLiu0709
Copy link
Contributor

  • Update Ernie 4.5-VL docs for AMD GPUs
  • Add VLLM_ROCM_USE_AITER=1 for enabling AITER backend

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ChangLiu0709, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the ERNIE-4.5-VL recipe documentation by integrating detailed instructions for deploying and benchmarking the model on AMD's latest MI300X, MI325X, and MI355X GPUs. The changes provide users with a clear guide to leverage AMD hardware, including specific Docker commands and vLLM server configurations, thereby broadening the accessibility and utility of the ERNIE-4.5-VL model across different hardware platforms.

Highlights

  • AMD GPU Support: Added comprehensive documentation for running the ERNIE-4.5-VL model on AMD MI300X, MI325X, and MI355X GPUs, including Docker setup, Hugging Face login, vLLM server configuration, and benchmarking instructions.
  • AITER Backend Activation: Included the VLLM_ROCM_USE_AITER=1 environment variable in the vLLM server command to enable the AITER backend for improved performance on ROCm-enabled systems.
  • Documentation Refinement: Corrected the capitalization of 'ERNIE-4.5-VL' in the vLLM installation section for consistency.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request successfully adds comprehensive documentation for running ERNIE-4.5-VL on AMD MI300X/MI325X/MI355X GPUs. This includes detailed steps for Docker environment setup, Hugging Face login, vLLM server startup, and benchmarking. The capitalization of 'Ernie' to 'ERNIE' has also been corrected for consistency.

--dataset-name random \
--random-input-len 8000 \
--random-output-len 1000 \
--request-rate 10000 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The --request-rate for the AMD benchmark is set to 10000, which is significantly higher than the 10 used in the general benchmarking section (line 62). This large discrepancy could lead to the benchmark failing or producing unrealistic results if the hardware cannot sustain such a high rate. Please verify if 10000 is the intended value or if it should be adjusted to a more realistic number, possibly aligning with the other benchmark examples or providing context for this high rate.

Suggested change
--request-rate 10000 \
--request-rate 10 \

Launch the ROCm vLLM docker:

```shell
docker run -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $(pwd):/work -e SHELL=/bin/bash --name Ernie-4.5-VL vllm/vllm-openai-rocm:v0.14.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The docker run command uses --privileged and --cap-add=CAP_SYS_ADMIN. While these might be necessary for ROCm environments, they grant extensive permissions to the container. It's generally recommended to use the most restrictive permissions possible. Consider adding a note about the security implications or exploring if a more granular set of capabilities can achieve the same functionality.

--tensor-parallel-size 4 \
--gpu-memory-utilization 0.9 \
--disable-log-requests \
--no-enable-prefix-caching \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The --no-enable-prefix-caching flag is used in the AMD GPU server startup command, but it's not present in the general benchmarking section's server command (lines 19-27). This inconsistency might lead to different performance characteristics between the two setups. If this flag is crucial for AMD GPUs or specific to this benchmark, it should be explained, or its absence in other sections should be justified for clarity.

@ChangLiu0709
Copy link
Contributor Author

Hi @tjtanaa I have updated the recipe with uv pip installation of vLLM. Please have a check : )

### Step 4: Run Benchmark
Open a new terminal and run the following command to execute the benchmark script inside the container.
```shell
docker exec -it Ernie-4.5-VL vllm bench serve \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We prioritize the pip install approach.
So, we should also include the command where there is no docker exec -it Ernie-4.5-VL,

Pull the latest vllm docker:

```shell
docker pull vllm/vllm-openai-rocm:v0.15.1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's just use docker pull vllm/vllm-openai-rocm:latest so that we don't need to keep on updating the doc.

Launch the ROCm vLLM docker:

```shell
docker run -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $(pwd):/work -e SHELL=/bin/bash --name Ernie-4.5-VL vllm/vllm-openai-rocm:v0.15.1
Copy link

@tjtanaa tjtanaa Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's just use vllm/vllm-openai-rocm:latest so that we don't need to keep on updating the doc.

Launch the ROCm vLLM docker:

```shell
docker run -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $(pwd):/work -e SHELL=/bin/bash --name Ernie-4.5-VL vllm/vllm-openai-rocm:v0.15.1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it multiline command and be consistent with existing format.

@ChangLiu0709 ChangLiu0709 force-pushed the Ernie4.5-VL branch 7 times, most recently from 4082bc9 to 48c4650 Compare February 24, 2026 17:41
@ChangLiu0709
Copy link
Contributor Author

Hi @tjtanaa thanks for all the feedback! I have just modified the content accordingly. Please have a re-check : ))

@ChangLiu0709
Copy link
Contributor Author

Hi @tjtanaa just removed unneeded docker command and please have a check.

@ChangLiu0709
Copy link
Contributor Author

Wondering if this can be merged @tjtanaa ?


## Installing vLLM
Ernie4.5-VL support was recently added to vLLM main branch and is not yet available in any official release:
ERNIE-4.5-VL support was recently added to vLLM main branch and is not yet available in any official release:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a subheader called ### CUDA and ### AMD ROCm: MI300x/MI325x/MI355x

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated : )

Signed-off-by: seungrokj <seungrok.jung@amd.com>
Signed-off-by: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>
Signed-off-by: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>
Signed-off-by: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>
Signed-off-by: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>
@ChangLiu0709
Copy link
Contributor Author

Hi @tjtanaa please have a check of this PR. I have updated the content according to your previous feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants