vllm-project · ChangLiu0709 · Feb 24, 2026 · Feb 27, 2026 · Feb 27, 2026 · Mar 5, 2026
diff --git a/Ernie/Ernie4.5-VL.md b/Ernie/Ernie4.5-VL.md
@@ -4,15 +4,23 @@ This guide describes how to run [ERNIE-4.5-VL-28B-A3B-PT](https://huggingface.co
 
 
 ## Installing vLLM
-Ernie4.5-VL support was recently added to vLLM main branch and is not yet available in any official release:
+### CUDA
+ERNIE-4.5-VL support was recently added to vLLM main branch and is not yet available in any official release:
 ```bash
 uv venv --python 3.12 --seed
 source .venv/bin/activate
 uv pip install -U vllm --torch-backend auto
 ```
 
+### AMD ROCm: MI300X/MI325X/MI355X
+```bash
+uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/0.16.0/rocm700
+```
+⚠️ The vLLM wheel for ROCm is compatible with Python 3.12, ROCm 7.0, and glibc >= 2.35. If your environment is incompatible, please use docker flow in [vLLM](https://vllm.ai/) 
+
 ## Running Ernie4.5-VL
 
+### Serving Ernie4.5-VL Model on H100 GPUs
 NOTE: torch.compile and cuda graph are not supported due to the heterogeneous expert architecture. (vision and text experts)
 ```bash
 # 28B model 80G*1 GPU
@@ -37,7 +45,6 @@ vllm serve baidu/ERNIE-4.5-VL-424B-A47B-PT \
   --cpu-offload-gb 50
 ```
 
-
 If your single node GPU memory is insufficient, native BF16 deployment may require multi nodes, multi node deployment reference [vLLM doc](https://docs.vllm.ai/en/latest/serving/parallelism_scaling.html?#multi-node-deployment) to start ray cluster. Then run vllm on the master node
 ```bash
 # 424B model 80G*16 GPU with native BF16
@@ -46,6 +53,20 @@ vllm serve baidu/ERNIE-4.5-VL-424B-A47B-PT \
   --tensor-parallel-size 16
 ```
 
+### Serving Ernie4.5-VL Model on MI300X/MI325X/MI355X GPUs
+
+Run the vLLM online serving on AMD GPUs using the command below:
+```bash
+VLLM_ROCM_USE_AITER=1 \
+SAFETENSORS_FAST_GPU=1 \
+vllm serve baidu/ERNIE-4.5-VL-28B-A3B-PT \
+    --tensor-parallel-size 4 \
+    --gpu-memory-utilization 0.9 \
+    --disable-log-requests \
+    --no-enable-prefix-caching \
+    --trust-remote-code
+```
+
 ## Benchmarking
 
 For benchmarking, only the first `vllm bench serve` after service startup to ensure it is not affected by prefix cache