Add the support of AMD MI300X/MI325X/MI355X of Ernie 4.5 recipe by ChangLiu0709 · Pull Request #227 · vllm-project/recipes

ChangLiu0709 · 2026-01-29T16:14:43Z

Update Ernie 4.5 docs for AMD GPUs
Add VLLM_ROCM_USE_AITER=1 for enabling AITER backend

gemini-code-assist · 2026-01-29T16:14:57Z

Summary of Changes

Hello @ChangLiu0709, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the hardware compatibility of the Ernie 4.5 recipe by integrating support for AMD's MI300X, MI325X, and MI355X GPUs. The changes primarily involve updating the existing documentation to guide users through the necessary Docker environment setup, vLLM server configuration, and benchmarking procedures tailored for these AMD platforms, ensuring optimal performance through specific environment variable settings.

Highlights

AMD GPU Support: Introduces comprehensive support for AMD MI300X, MI325X, and MI355X GPUs within the Ernie 4.5 recipe.
Documentation Update: Adds a new section to the Ernie/Ernie4.5.md file detailing the setup and execution steps for AMD GPUs.
AITER Backend Activation: Includes the VLLM_ROCM_USE_AITER=1 environment variable in the vLLM server startup command to enable the AITER backend for AMD GPUs.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds documentation for running Ernie 4.5 models on AMD GPUs. The changes are clear and provide step-by-step instructions. I've added a couple of suggestions to improve the readability of a long command and to fix a potential typo in a model identifier that could cause an error.

gemini-code-assist · 2026-01-29T16:16:00Z

Ernie/Ernie4.5.md

+
+Launch the ROCm vLLM docker: 
+```shell
+docker run -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $(pwd):/work -e SHELL=/bin/bash --name Ernie-4.5 vllm/vllm-openai-rocm:v0.14.1 


For better readability, consider splitting this long docker run command into multiple lines using backslashes. This makes the command easier to read and understand, similar to how other multi-line commands are formatted in this document.

Suggested change

docker run -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $(pwd):/work -e SHELL=/bin/bash --name Ernie-4.5 vllm/vllm-openai-rocm:v0.14.1

docker run -it \

--ipc=host \

--network=host \

--privileged \

--cap-add=CAP_SYS_ADMIN \

--device=/dev/kfd \

--device=/dev/dri \

--device=/dev/mem \

--group-add video \

--cap-add=SYS_PTRACE \

--security-opt seccomp=unconfined \

-v $(pwd):/work \

-e SHELL=/bin/bash \

--name Ernie-4.5 \

vllm/vllm-openai-rocm:v0.14.1

gemini-code-assist · 2026-01-29T16:16:00Z

Ernie/Ernie4.5.md

+```shell
+VLLM_ROCM_USE_AITER=1 \
+SAFETENSORS_FAST_GPU=1 \
+vllm serve baidu/ERNIE-4.5-21B-A3B-PT/ \


The model identifier baidu/ERNIE-4.5-21B-A3B-PT/ includes a trailing slash. This is likely a typo and could cause the command to fail if vllm interprets it as a local path instead of a Hugging Face model ID. For consistency with the benchmark command and standard practice, the trailing slash should be removed.

Suggested change

vllm serve baidu/ERNIE-4.5-21B-A3B-PT/ \

vllm serve baidu/ERNIE-4.5-21B-A3B-PT \

ChangLiu0709 · 2026-02-09T14:29:58Z

Hi @tjtanaa I have updated the recipe with uv pip installation of vLLM. Please have a check : )

ChangLiu0709 · 2026-02-24T17:13:37Z

Hi @tjtanaa updated the content according to the same requirements in the Glyph doc. Please have a check.

ChangLiu0709 · 2026-02-27T15:59:06Z

Hi @tjtanaa I just removed the docker bench command and please have a check!

ChangLiu0709 · 2026-03-04T12:01:25Z

Hi @tjtanaa wondering if this can be merged : ))

tjtanaa · 2026-03-04T14:59:52Z

Ernie/Ernie4.5.md


+## Installing vLLM (For AMD ROCm: MI300x/MI325x/MI355x)
+```bash
+uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/0.14.1/rocm700


we already have the latest vllm version v0.16.0

https://wheels.vllm.ai/rocm/0.16.0/rocm700

tjtanaa · 2026-03-04T15:00:15Z

Ernie/Ernie4.5.md

  --speculative-config '{"method": "ernie_mtp","model": "baidu/ERNIE-4.5-300B-A47B-PT","num_speculative_tokens": 1}'
 ```

-


can you remove this unnecessary line change?

tjtanaa · 2026-03-04T15:00:17Z

Ernie/Ernie4.5.md


 For benchmarking, only the first `vllm bench serve` after service startup to ensure it is not affected by prefix cache

-


can you remove this unnecessary line change?

tjtanaa · 2026-03-04T15:00:20Z

Ernie/Ernie4.5.md


 ### Expected Output

-


can you remove this unnecessary line change?

tjtanaa · 2026-03-04T15:00:22Z

Ernie/Ernie4.5.md

 P99 ITL (ms):                            20.69     
 ==================================================
 ```
-


can you remove this unnecessary line change?

tjtanaa · 2026-03-04T15:01:57Z

Ernie/Ernie4.5.md

+    --tensor-parallel-size 4 \
+    --gpu-memory-utilization 0.9 \
+    --disable-log-requests \
+    --no-enable-prefix-caching \


as discussed, vllm recipes are for application users. We should not disable prefix caching in actual deployment.

tjtanaa · 2026-03-04T15:02:31Z

We try to get this PR merged first before reviewing the other PRs.

Signed-off-by: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>

…sion and the ROCm command Signed-off-by: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>

ChangLiu0709 · 2026-03-05T12:43:34Z

Hi @tjtanaa just updated the content according to your feedback. Please have a check : ))

ChangLiu0709 · 2026-03-09T14:25:51Z

Hi @tjtanaa just a kind reminder. Wondering if we can get this PR merged : ))

gemini-code-assist bot reviewed Jan 29, 2026

View reviewed changes

ChangLiu0709 force-pushed the Ernie4.5 branch from 6034b64 to 92eb6fe Compare January 29, 2026 16:28

ChangLiu0709 force-pushed the Ernie4.5 branch 3 times, most recently from 31dddcc to fc809d5 Compare February 24, 2026 17:11

ChangLiu0709 force-pushed the Ernie4.5 branch 3 times, most recently from 0172ef0 to fa7698e Compare February 27, 2026 15:54

ChangLiu0709 force-pushed the Ernie4.5 branch from 8ddf4c0 to 7c747d2 Compare February 27, 2026 17:03

tjtanaa reviewed Mar 4, 2026

View reviewed changes

Ernie/Ernie4.5.md

### Expected Output

Copy link

tjtanaa Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove this unnecessary line change?

tjtanaa reviewed Mar 4, 2026

View reviewed changes

Ernie/Ernie4.5.md

P99 ITL (ms): 20.69

==================================================

```

Copy link

tjtanaa Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove this unnecessary line change?

tjtanaa reviewed Mar 4, 2026

View reviewed changes

ChangLiu0709 added 5 commits March 5, 2026 12:42

Update Ernie4.5.md with AMD GPU support (MI300X/MI325X/MI355X)

2ef012d

Signed-off-by: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>

Fix a format typo of the Ernie 4.5 model name

9013b33

Signed-off-by: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>

Remove docker benchmark command

83b6ea6

Signed-off-by: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>

Reformat the content merging AMD and NVIDIA commands together

97d3b50

Signed-off-by: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>

Remove unnecessary lines (previous legacy) and modify the library ver…

8a8e9a6

…sion and the ROCm command Signed-off-by: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>

ChangLiu0709 force-pushed the Ernie4.5 branch from afc5303 to 8a8e9a6 Compare March 5, 2026 12:43

-docker run -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $(pwd):/work -e SHELL=/bin/bash --name Ernie-4.5 vllm/vllm-openai-rocm:v0.14.1
+docker run -it \
+    --ipc=host \
+    --network=host \
+    --privileged \
+    --cap-add=CAP_SYS_ADMIN \
+    --device=/dev/kfd \
+    --device=/dev/dri \
+    --device=/dev/mem \
+    --group-add video \
+    --cap-add=SYS_PTRACE \
+    --security-opt seccomp=unconfined \
+    -v $(pwd):/work \
+    -e SHELL=/bin/bash \
+    --name Ernie-4.5 \
+    vllm/vllm-openai-rocm:v0.14.1

	vllm serve baidu/ERNIE-4.5-21B-A3B-PT/ \
	vllm serve baidu/ERNIE-4.5-21B-A3B-PT \

		--speculative-config '{"method": "ernie_mtp","model": "baidu/ERNIE-4.5-300B-A47B-PT","num_speculative_tokens": 1}'
		```


		For benchmarking, only the first `vllm bench serve` after service startup to ensure it is not affected by prefix cache

Conversation

ChangLiu0709 commented Jan 29, 2026

Uh oh!

gemini-code-assist bot commented Jan 29, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

ChangLiu0709 commented Feb 9, 2026

Uh oh!

ChangLiu0709 commented Feb 24, 2026

Uh oh!

ChangLiu0709 commented Feb 27, 2026

Uh oh!

ChangLiu0709 commented Mar 4, 2026

Uh oh!

tjtanaa Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

tjtanaa Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

tjtanaa Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

tjtanaa Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

tjtanaa Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

tjtanaa Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

tjtanaa commented Mar 4, 2026

Uh oh!

ChangLiu0709 commented Mar 5, 2026

Uh oh!

ChangLiu0709 commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants