From 0f065ac8ad59230bca8c23ff9b5dfbdbbdc64578 Mon Sep 17 00:00:00 2001
From: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>
Date: Tue, 24 Feb 2026 17:40:17 +0000
Subject: [PATCH 1/4] Add Ernie4.5-VL recipe with AMD MI300X/MI325X/MI355X
 support

Signed-off-by: seungrokj <seungrok.jung@amd.com>
Signed-off-by: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>
---
 Ernie/Ernie4.5-VL.md | 100 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 99 insertions(+), 1 deletion(-)

diff --git a/Ernie/Ernie4.5-VL.md b/Ernie/Ernie4.5-VL.md
index 87a37b09..a4ed8ff8 100644
--- a/Ernie/Ernie4.5-VL.md
+++ b/Ernie/Ernie4.5-VL.md
@@ -4,7 +4,7 @@ This guide describes how to run [ERNIE-4.5-VL-28B-A3B-PT](https://huggingface.co
 
 
 ## Installing vLLM
-Ernie4.5-VL support was recently added to vLLM main branch and is not yet available in any official release:
+ERNIE-4.5-VL support was recently added to vLLM main branch and is not yet available in any official release:
 ```bash
 uv venv --python 3.12 --seed
 source .venv/bin/activate
@@ -101,3 +101,101 @@ Median ITL (ms):                         36.35
 P99 ITL (ms):                            236.49    
 ==================================================
 ```
+
+
+## AMD GPU Support
+
+Please follow the steps here to install and run ERNIE-4.5-VL model on AMD MI300X, MI325X, MI355X GPUs.
+
+### Step 1: Prepare Environment
+#### Option 1: Installation from pre-built wheels (For AMD ROCm: MI300x/MI325x/MI355x)
+We recommend using the official package for AMD GPUs (MI300x/MI325x/MI355x). 
+```bash
+uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm
+```
+⚠️ The vLLM wheel for ROCm is compatible with Python 3.12, ROCm 7.0, and glibc >= 2.35. If your environment is incompatible, please use docker flow in [vLLM](https://vllm.ai/).
+
+#### Option 2: Docker image
+Pull the latest vllm docker:
+
+```bash
+docker pull vllm/vllm-openai-rocm:latest
+```
+
+Launch the ROCm vLLM docker: 
+
+```bash
+docker run -it \
+  --ipc=host \
+  --network=host \
+  --privileged \
+  --cap-add=CAP_SYS_ADMIN \
+  --device=/dev/kfd \
+  --device=/dev/dri \
+  --device=/dev/mem \
+  --group-add video \
+  --cap-add=SYS_PTRACE \
+  --security-opt seccomp=unconfined \
+  -v $(pwd):/work \
+  -e SHELL=/bin/bash \
+  --name Ernie-4.5-VL \
+  vllm/vllm-openai-rocm:latest
+```
+
+After running the command above, you are already inside the container. Proceed to Step 2 in that shell. If you detached from the container or started it in detached mode, attach to the container with:
+
+```bash
+docker attach Ernie-4.5-VL
+```
+
+### Step 2: Log in to Hugging Face
+Hugging Face login:
+
+```bash
+huggingface-cli login
+```
+
+### Step 3: Start the vLLM server
+
+Run the vllm online serving
+Sample Command
+```bash
+VLLM_ROCM_USE_AITER=1 \
+SAFETENSORS_FAST_GPU=1 \
+vllm serve baidu/ERNIE-4.5-VL-28B-A3B-PT \
+    --tensor-parallel-size 4 \
+    --gpu-memory-utilization 0.9 \
+    --disable-log-requests \
+    --no-enable-prefix-caching \
+    --trust-remote-code
+```
+
+
+### Step 4: Run Benchmark
+Open a new terminal and run the following command to execute the benchmark script:
+
+```bash
+vllm bench serve \
+  --model baidu/ERNIE-4.5-VL-28B-A3B-PT \
+  --dataset-name random \
+  --random-input-len 8000 \
+  --random-output-len 1000 \
+  --request-rate 10000 \
+  --num-prompts 16 \
+  --trust-remote-code \
+  --ignore-eos
+```
+
+If you are using a Docker environment, open a new terminal and run the benchmark inside the container with:
+
+```bash
+docker exec -it Ernie-4.5-VL vllm bench serve \
+  --model baidu/ERNIE-4.5-VL-28B-A3B-PT \
+  --dataset-name random \
+  --random-input-len 8000 \
+  --random-output-len 1000 \
+  --request-rate 10000 \
+  --num-prompts 16 \
+  --trust-remote-code \
+  --ignore-eos
+```
\ No newline at end of file

From b751ce5c176edbf0b65a0e9b3a85b6cabf265000 Mon Sep 17 00:00:00 2001
From: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>
Date: Fri, 27 Feb 2026 15:55:47 +0000
Subject: [PATCH 2/4] Remove docker benchmark command

Signed-off-by: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>
---
 Ernie/Ernie4.5-VL.md | 14 --------------
 1 file changed, 14 deletions(-)

diff --git a/Ernie/Ernie4.5-VL.md b/Ernie/Ernie4.5-VL.md
index a4ed8ff8..70c2a178 100644
--- a/Ernie/Ernie4.5-VL.md
+++ b/Ernie/Ernie4.5-VL.md
@@ -185,17 +185,3 @@ vllm bench serve \
   --trust-remote-code \
   --ignore-eos
 ```
-
-If you are using a Docker environment, open a new terminal and run the benchmark inside the container with:
-
-```bash
-docker exec -it Ernie-4.5-VL vllm bench serve \
-  --model baidu/ERNIE-4.5-VL-28B-A3B-PT \
-  --dataset-name random \
-  --random-input-len 8000 \
-  --random-output-len 1000 \
-  --request-rate 10000 \
-  --num-prompts 16 \
-  --trust-remote-code \
-  --ignore-eos
-```
\ No newline at end of file

From 81e69b6644341f0a911a8abf8957bb84330b3ebf Mon Sep 17 00:00:00 2001
From: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>
Date: Fri, 27 Feb 2026 16:52:55 +0000
Subject: [PATCH 3/4] Reformat the content merging AMD and NVIDIA settings
 together

Signed-off-by: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>
---
 Ernie/Ernie4.5-VL.md | 106 +++++++++----------------------------------
 1 file changed, 21 insertions(+), 85 deletions(-)

diff --git a/Ernie/Ernie4.5-VL.md b/Ernie/Ernie4.5-VL.md
index 70c2a178..40fe44c7 100644
--- a/Ernie/Ernie4.5-VL.md
+++ b/Ernie/Ernie4.5-VL.md
@@ -11,8 +11,15 @@ source .venv/bin/activate
 uv pip install -U vllm --torch-backend auto
 ```
 
+## Installing vLLM (For AMD ROCm: MI300x/MI325x/MI355x)
+```bash
+uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/0.14.1/rocm700
+```
+⚠️ The vLLM wheel for ROCm is compatible with Python 3.12, ROCm 7.0, and glibc >= 2.35. If your environment is incompatible, please use docker flow in [vLLM](https://vllm.ai/) 
+
 ## Running Ernie4.5-VL
 
+### Serving Ernie4.5-VL Model on H100 GPUs
 NOTE: torch.compile and cuda graph are not supported due to the heterogeneous expert architecture. (vision and text experts)
 ```bash
 # 28B model 80G*1 GPU
@@ -37,7 +44,6 @@ vllm serve baidu/ERNIE-4.5-VL-424B-A47B-PT \
   --cpu-offload-gb 50
 ```
 
-
 If your single node GPU memory is insufficient, native BF16 deployment may require multi nodes, multi node deployment reference [vLLM doc](https://docs.vllm.ai/en/latest/serving/parallelism_scaling.html?#multi-node-deployment) to start ray cluster. Then run vllm on the master node
 ```bash
 # 424B model 80G*16 GPU with native BF16
@@ -46,6 +52,20 @@ vllm serve baidu/ERNIE-4.5-VL-424B-A47B-PT \
   --tensor-parallel-size 16
 ```
 
+### Serving Ernie4.5-VL Model on MI300X/MI325X/MI355X GPUs
+
+Run the vLLM online serving on AMD GPUs using the command below:
+```bash
+VLLM_ROCM_USE_AITER=1 \
+SAFETENSORS_FAST_GPU=1 \
+vllm serve baidu/ERNIE-4.5-VL-28B-A3B-PT \
+    --tensor-parallel-size 4 \
+    --gpu-memory-utilization 0.9 \
+    --disable-log-requests \
+    --no-enable-prefix-caching \
+    --trust-remote-code
+```
+
 ## Benchmarking
 
 For benchmarking, only the first `vllm bench serve` after service startup to ensure it is not affected by prefix cache
@@ -101,87 +121,3 @@ Median ITL (ms):                         36.35
 P99 ITL (ms):                            236.49    
 ==================================================
 ```
-
-
-## AMD GPU Support
-
-Please follow the steps here to install and run ERNIE-4.5-VL model on AMD MI300X, MI325X, MI355X GPUs.
-
-### Step 1: Prepare Environment
-#### Option 1: Installation from pre-built wheels (For AMD ROCm: MI300x/MI325x/MI355x)
-We recommend using the official package for AMD GPUs (MI300x/MI325x/MI355x). 
-```bash
-uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm
-```
-⚠️ The vLLM wheel for ROCm is compatible with Python 3.12, ROCm 7.0, and glibc >= 2.35. If your environment is incompatible, please use docker flow in [vLLM](https://vllm.ai/).
-
-#### Option 2: Docker image
-Pull the latest vllm docker:
-
-```bash
-docker pull vllm/vllm-openai-rocm:latest
-```
-
-Launch the ROCm vLLM docker: 
-
-```bash
-docker run -it \
-  --ipc=host \
-  --network=host \
-  --privileged \
-  --cap-add=CAP_SYS_ADMIN \
-  --device=/dev/kfd \
-  --device=/dev/dri \
-  --device=/dev/mem \
-  --group-add video \
-  --cap-add=SYS_PTRACE \
-  --security-opt seccomp=unconfined \
-  -v $(pwd):/work \
-  -e SHELL=/bin/bash \
-  --name Ernie-4.5-VL \
-  vllm/vllm-openai-rocm:latest
-```
-
-After running the command above, you are already inside the container. Proceed to Step 2 in that shell. If you detached from the container or started it in detached mode, attach to the container with:
-
-```bash
-docker attach Ernie-4.5-VL
-```
-
-### Step 2: Log in to Hugging Face
-Hugging Face login:
-
-```bash
-huggingface-cli login
-```
-
-### Step 3: Start the vLLM server
-
-Run the vllm online serving
-Sample Command
-```bash
-VLLM_ROCM_USE_AITER=1 \
-SAFETENSORS_FAST_GPU=1 \
-vllm serve baidu/ERNIE-4.5-VL-28B-A3B-PT \
-    --tensor-parallel-size 4 \
-    --gpu-memory-utilization 0.9 \
-    --disable-log-requests \
-    --no-enable-prefix-caching \
-    --trust-remote-code
-```
-
-
-### Step 4: Run Benchmark
-Open a new terminal and run the following command to execute the benchmark script:
-
-```bash
-vllm bench serve \
-  --model baidu/ERNIE-4.5-VL-28B-A3B-PT \
-  --dataset-name random \
-  --random-input-len 8000 \
-  --random-output-len 1000 \
-  --request-rate 10000 \
-  --num-prompts 16 \
-  --trust-remote-code \
-  --ignore-eos
-```

From 437d5bfc31498ce28166a702db577405a08c9fc9 Mon Sep 17 00:00:00 2001
From: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>
Date: Thu, 5 Mar 2026 12:35:09 +0000
Subject: [PATCH 4/4] Add sections for CUDA and ROCm

Signed-off-by: ChangLiu0709 <ChangLiu0709@users.noreply.github.com>
---
 Ernie/Ernie4.5-VL.md | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Ernie/Ernie4.5-VL.md b/Ernie/Ernie4.5-VL.md
index 40fe44c7..2c38a1b2 100644
--- a/Ernie/Ernie4.5-VL.md
+++ b/Ernie/Ernie4.5-VL.md
@@ -4,6 +4,7 @@ This guide describes how to run [ERNIE-4.5-VL-28B-A3B-PT](https://huggingface.co
 
 
 ## Installing vLLM
+### CUDA
 ERNIE-4.5-VL support was recently added to vLLM main branch and is not yet available in any official release:
 ```bash
 uv venv --python 3.12 --seed
@@ -11,9 +12,9 @@ source .venv/bin/activate
 uv pip install -U vllm --torch-backend auto
 ```
 
-## Installing vLLM (For AMD ROCm: MI300x/MI325x/MI355x)
+### AMD ROCm: MI300X/MI325X/MI355X
 ```bash
-uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/0.14.1/rocm700
+uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/0.16.0/rocm700
 ```
 ⚠️ The vLLM wheel for ROCm is compatible with Python 3.12, ROCm 7.0, and glibc >= 2.35. If your environment is incompatible, please use docker flow in [vLLM](https://vllm.ai/)