Skip to content

Add the support of AMD MI300X/MI325X/MI355X of Ernie 4.5 recipe#227

Open
ChangLiu0709 wants to merge 5 commits intovllm-project:mainfrom
ChangLiu0709:Ernie4.5
Open

Add the support of AMD MI300X/MI325X/MI355X of Ernie 4.5 recipe#227
ChangLiu0709 wants to merge 5 commits intovllm-project:mainfrom
ChangLiu0709:Ernie4.5

Conversation

@ChangLiu0709
Copy link
Contributor

  • Update Ernie 4.5 docs for AMD GPUs
  • Add VLLM_ROCM_USE_AITER=1 for enabling AITER backend

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ChangLiu0709, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the hardware compatibility of the Ernie 4.5 recipe by integrating support for AMD's MI300X, MI325X, and MI355X GPUs. The changes primarily involve updating the existing documentation to guide users through the necessary Docker environment setup, vLLM server configuration, and benchmarking procedures tailored for these AMD platforms, ensuring optimal performance through specific environment variable settings.

Highlights

  • AMD GPU Support: Introduces comprehensive support for AMD MI300X, MI325X, and MI355X GPUs within the Ernie 4.5 recipe.
  • Documentation Update: Adds a new section to the Ernie/Ernie4.5.md file detailing the setup and execution steps for AMD GPUs.
  • AITER Backend Activation: Includes the VLLM_ROCM_USE_AITER=1 environment variable in the vLLM server startup command to enable the AITER backend for AMD GPUs.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds documentation for running Ernie 4.5 models on AMD GPUs. The changes are clear and provide step-by-step instructions. I've added a couple of suggestions to improve the readability of a long command and to fix a potential typo in a model identifier that could cause an error.


Launch the ROCm vLLM docker:
```shell
docker run -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $(pwd):/work -e SHELL=/bin/bash --name Ernie-4.5 vllm/vllm-openai-rocm:v0.14.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better readability, consider splitting this long docker run command into multiple lines using backslashes. This makes the command easier to read and understand, similar to how other multi-line commands are formatted in this document.

Suggested change
docker run -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $(pwd):/work -e SHELL=/bin/bash --name Ernie-4.5 vllm/vllm-openai-rocm:v0.14.1
docker run -it \
--ipc=host \
--network=host \
--privileged \
--cap-add=CAP_SYS_ADMIN \
--device=/dev/kfd \
--device=/dev/dri \
--device=/dev/mem \
--group-add video \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
-v $(pwd):/work \
-e SHELL=/bin/bash \
--name Ernie-4.5 \
vllm/vllm-openai-rocm:v0.14.1

```shell
VLLM_ROCM_USE_AITER=1 \
SAFETENSORS_FAST_GPU=1 \
vllm serve baidu/ERNIE-4.5-21B-A3B-PT/ \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The model identifier baidu/ERNIE-4.5-21B-A3B-PT/ includes a trailing slash. This is likely a typo and could cause the command to fail if vllm interprets it as a local path instead of a Hugging Face model ID. For consistency with the benchmark command and standard practice, the trailing slash should be removed.

Suggested change
vllm serve baidu/ERNIE-4.5-21B-A3B-PT/ \
vllm serve baidu/ERNIE-4.5-21B-A3B-PT \

@ChangLiu0709
Copy link
Contributor Author

Hi @tjtanaa I have updated the recipe with uv pip installation of vLLM. Please have a check : )

@ChangLiu0709 ChangLiu0709 force-pushed the Ernie4.5 branch 3 times, most recently from 31dddcc to fc809d5 Compare February 24, 2026 17:11
@ChangLiu0709
Copy link
Contributor Author

Hi @tjtanaa updated the content according to the same requirements in the Glyph doc. Please have a check.

@ChangLiu0709 ChangLiu0709 force-pushed the Ernie4.5 branch 3 times, most recently from 0172ef0 to fa7698e Compare February 27, 2026 15:54
@ChangLiu0709
Copy link
Contributor Author

Hi @tjtanaa I just removed the docker bench command and please have a check!

@ChangLiu0709
Copy link
Contributor Author

Hi @tjtanaa wondering if this can be merged : ))


## Installing vLLM (For AMD ROCm: MI300x/MI325x/MI355x)
```bash
uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/0.14.1/rocm700
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have the latest vllm version v0.16.0

https://wheels.vllm.ai/rocm/0.16.0/rocm700

--speculative-config '{"method": "ernie_mtp","model": "baidu/ERNIE-4.5-300B-A47B-PT","num_speculative_tokens": 1}'
```


Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove this unnecessary line change?


For benchmarking, only the first `vllm bench serve` after service startup to ensure it is not affected by prefix cache


Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove this unnecessary line change?


### Expected Output


Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove this unnecessary line change?

P99 ITL (ms): 20.69
==================================================
```

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove this unnecessary line change?

--tensor-parallel-size 4 \
--gpu-memory-utilization 0.9 \
--disable-log-requests \
--no-enable-prefix-caching \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed, vllm recipes are for application users. We should not disable prefix caching in actual deployment.

@tjtanaa
Copy link

tjtanaa commented Mar 4, 2026

We try to get this PR merged first before reviewing the other PRs.

Signed-off-by: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>
Signed-off-by: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>
Signed-off-by: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>
Signed-off-by: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>
…sion and the ROCm command

Signed-off-by: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>
@ChangLiu0709
Copy link
Contributor Author

Hi @tjtanaa just updated the content according to your feedback. Please have a check : ))

@ChangLiu0709
Copy link
Contributor Author

Hi @tjtanaa just a kind reminder. Wondering if we can get this PR merged : ))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants