From 3e56070a1224d3af124e4d9495b8cc18256aebb8 Mon Sep 17 00:00:00 2001
From: haic0 <149741444+haic0@users.noreply.github.com>
Date: Thu, 29 Jan 2026 11:49:19 +0800
Subject: [PATCH 1/5] Update Qwen3-Coder-480B-A35B.md for AMD

Signed-off-by: haic0 <149741444+haic0@users.noreply.github.com>
---
 Qwen/Qwen3-Coder-480B-A35B.md | 65 +++++++++++++++++++++++++++++++++++
 1 file changed, 65 insertions(+)

diff --git a/Qwen/Qwen3-Coder-480B-A35B.md b/Qwen/Qwen3-Coder-480B-A35B.md
index 37ccbf62..24e39d8f 100644
--- a/Qwen/Qwen3-Coder-480B-A35B.md
+++ b/Qwen/Qwen3-Coder-480B-A35B.md
@@ -132,3 +132,68 @@ ERROR [multiproc_executor.py:511] ValueError: The output_size of gate's and up's
 - [EvalPlus](https://github.com/evalplus/evalplus)
 - [Qwen3-Coder](https://github.com/QwenLM/Qwen3-Coder)
 - [vLLM Documentation](https://docs.vllm.ai/)
+
+
+## AMD GPU Support
+Recommended approaches by hardware type are:
+
+
+MI300X/MI325X/MI355X  with fp8: Use FP8 checkpoint for optimal memory efficiency.
+
+- **MI300X/MI325X/MI355X with `fp8`**: Use FP8 checkpoint for optimal memory efficiency.
+- **MI300X/MI325X/MI355X with `bfloat16`**
+
+
+Please follow the steps here to install and run Qwen3-Coder models on AMD MI300X/MI325X/MI355X GPU.
+
+### Step 1: Prepare Docker Environment
+Pull the latest vllm docker:
+```shell
+docker pull vllm/vllm-openai-rocm:v0.14.1
+```
+Launch the ROCm vLLM docker: 
+```shell
+docker run -d -it --entrypoint /bin/bash --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $(pwd):/work -v ~/.cache/huggingface:/root/.cache/huggingface  -p 8000:8000 --name Qwen3-Coder   vllm/vllm-openai-rocm:v0.14.1
+```
+### Step 2: Log in to Hugging Face
+Log in to your Hugging Face account:
+```shell
+hf auth login
+```
+
+### Step 3: Start the vLLM server
+
+Run the vllm online serving
+```shell
+docker exec -it Qwen3-Coder /bin/bash 
+```
+
+### BF16 
+
+
+```shell
+VLLM_ROCM_USE_AITER=1 vllm serve Qwen/Qwen3-Coder-480B-A35B-Instruct --trust-remote-code --max-model-len 131072 --enable-expert-parallel --data-parallel-size 8 --enable-auto-tool-choice --tool-call-parser qwen3_coder
+```
+
+### FP8 
+
+```shell
+
+VLLM_ROCM_USE_AITER=1 vllm serve Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 --trust-remote-code --max-model-len 131072 --enable-expert-parallel --data-parallel-size 8 --enable-auto-tool-choice --tool-call-parser qwen3_coder
+
+```
+
+
+### Step 4: Run Benchmark
+Open a new terminal and run the following command to execute the benchmark script inside the container.
+```shell
+docker exec -it Qwen3-Coder vllm bench serve \
+  --model "Qwen/Qwen3-Coder-480B-A35B-Instruct" \
+  --dataset-name random \
+  --random-input-len 8192 \
+  --random-output-len 1024 \
+  --request-rate 10000 \
+  --num-prompts 16 \
+  --ignore-eos \
+  --trust-remote-code 
+```

From 5a0e67d785f9d79d9afa5ec7b1d13afe2ddf32a3 Mon Sep 17 00:00:00 2001
From: haic0 <149741444+haic0@users.noreply.github.com>
Date: Sat, 31 Jan 2026 16:04:44 +0800
Subject: [PATCH 2/5] Update Qwen3-Coder-480B-A35B.md for AMD

---
 Qwen/Qwen3-Coder-480B-A35B.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Qwen/Qwen3-Coder-480B-A35B.md b/Qwen/Qwen3-Coder-480B-A35B.md
index 24e39d8f..88e31046 100644
--- a/Qwen/Qwen3-Coder-480B-A35B.md
+++ b/Qwen/Qwen3-Coder-480B-A35B.md
@@ -153,7 +153,7 @@ docker pull vllm/vllm-openai-rocm:v0.14.1
 ```
 Launch the ROCm vLLM docker: 
 ```shell
-docker run -d -it --entrypoint /bin/bash --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $(pwd):/work -v ~/.cache/huggingface:/root/.cache/huggingface  -p 8000:8000 --name Qwen3-Coder   vllm/vllm-openai-rocm:v0.14.1
+docker run -d -it --entrypoint /bin/bash --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /:/work   -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 --name Qwen3-Coder vllm/vllm-openai-rocm:v0.14.1
 ```
 ### Step 2: Log in to Hugging Face
 Log in to your Hugging Face account:

From 724c2b453159c7643d739ee24c0d73f1d52e21b3 Mon Sep 17 00:00:00 2001
From: haic0 <149741444+haic0@users.noreply.github.com>
Date: Tue, 3 Feb 2026 23:09:23 +0800
Subject: [PATCH 3/5] Update Qwen3-Coder-480B-A35B.md for AMD

---
 Qwen/Qwen3-Coder-480B-A35B.md | 36 +++++++++++++----------------------
 1 file changed, 13 insertions(+), 23 deletions(-)

diff --git a/Qwen/Qwen3-Coder-480B-A35B.md b/Qwen/Qwen3-Coder-480B-A35B.md
index 88e31046..d2077ee8 100644
--- a/Qwen/Qwen3-Coder-480B-A35B.md
+++ b/Qwen/Qwen3-Coder-480B-A35B.md
@@ -134,39 +134,29 @@ ERROR [multiproc_executor.py:511] ValueError: The output_size of gate's and up's
 - [vLLM Documentation](https://docs.vllm.ai/)
 
 
+
 ## AMD GPU Support
 Recommended approaches by hardware type are:
 
 
-MI300X/MI325X/MI355X  with fp8: Use FP8 checkpoint for optimal memory efficiency.
+MI300X/MI325X/MI355X 
 
-- **MI300X/MI325X/MI355X with `fp8`**: Use FP8 checkpoint for optimal memory efficiency.
-- **MI300X/MI325X/MI355X with `bfloat16`**
+Please follow the steps here to install and run Qwen3-Coder models on AMD MI300X/MI325X/MI355X GPU.
 
+### Step 1: Installing vLLM (AMD ROCm Backend: MI300X, MI325X, MI355X) 
+ > Note: The vLLM wheel for ROCm requires Python 3.12, ROCm 7.0, and glibc >= 2.35. If your environment does not meet these requirements, please use the Docker-based setup as described in the [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#pre-built-images).
 
-Please follow the steps here to install and run Qwen3-Coder models on AMD MI300X/MI325X/MI355X GPU.
 
-### Step 1: Prepare Docker Environment
-Pull the latest vllm docker:
-```shell
-docker pull vllm/vllm-openai-rocm:v0.14.1
-```
-Launch the ROCm vLLM docker: 
-```shell
-docker run -d -it --entrypoint /bin/bash --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /:/work   -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 --name Qwen3-Coder vllm/vllm-openai-rocm:v0.14.1
-```
-### Step 2: Log in to Hugging Face
-Log in to your Hugging Face account:
-```shell
-hf auth login
-```
+ ```bash 
+ uv venv 
+ source .venv/bin/activate 
+ uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/0.14.1/rocm700
+ ```
 
-### Step 3: Start the vLLM server
+
+### Step 2: Start the vLLM server
 
 Run the vllm online serving
-```shell
-docker exec -it Qwen3-Coder /bin/bash 
-```
 
 ### BF16 
 
@@ -187,7 +177,7 @@ VLLM_ROCM_USE_AITER=1 vllm serve Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 --trust
 ### Step 4: Run Benchmark
 Open a new terminal and run the following command to execute the benchmark script inside the container.
 ```shell
-docker exec -it Qwen3-Coder vllm bench serve \
+ vllm bench serve \
   --model "Qwen/Qwen3-Coder-480B-A35B-Instruct" \
   --dataset-name random \
   --random-input-len 8192 \

From d2849c9286b3269634d77fad738250ccf5e5e155 Mon Sep 17 00:00:00 2001
From: haic0 <149741444+haic0@users.noreply.github.com>
Date: Fri, 6 Feb 2026 16:52:58 +0800
Subject: [PATCH 4/5] Update Qwen3-Coder-480B-A35B.md for AMD

---
 Qwen/Qwen3-Coder-480B-A35B.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Qwen/Qwen3-Coder-480B-A35B.md b/Qwen/Qwen3-Coder-480B-A35B.md
index d2077ee8..510ff3f2 100644
--- a/Qwen/Qwen3-Coder-480B-A35B.md
+++ b/Qwen/Qwen3-Coder-480B-A35B.md
@@ -150,7 +150,7 @@ Please follow the steps here to install and run Qwen3-Coder models on AMD MI300X
  ```bash 
  uv venv 
  source .venv/bin/activate 
- uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/0.14.1/rocm700
+ uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/
  ```
 
 

From f1650b85c3d7c7103936c67e76584964889d77f6 Mon Sep 17 00:00:00 2001
From: haic0 <149741444+haic0@users.noreply.github.com>
Date: Fri, 6 Feb 2026 17:53:20 +0800
Subject: [PATCH 5/5] Update Qwen3-Coder-480B-A35B.md for AMD

---
 Qwen/Qwen3-Coder-480B-A35B.md | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/Qwen/Qwen3-Coder-480B-A35B.md b/Qwen/Qwen3-Coder-480B-A35B.md
index 510ff3f2..3ce3051c 100644
--- a/Qwen/Qwen3-Coder-480B-A35B.md
+++ b/Qwen/Qwen3-Coder-480B-A35B.md
@@ -176,14 +176,15 @@ VLLM_ROCM_USE_AITER=1 vllm serve Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 --trust
 
 ### Step 4: Run Benchmark
 Open a new terminal and run the following command to execute the benchmark script inside the container.
+
 ```shell
- vllm bench serve \
-  --model "Qwen/Qwen3-Coder-480B-A35B-Instruct" \
+vllm bench serve \
+  --backend vllm \
+  --model Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 \
+  --endpoint /v1/completions \
   --dataset-name random \
-  --random-input-len 8192 \
-  --random-output-len 1024 \
-  --request-rate 10000 \
-  --num-prompts 16 \
-  --ignore-eos \
-  --trust-remote-code 
+  --random-input 2048 \
+  --random-output 1024 \
+  --max-concurrency 10 \
+  --num-prompt 100
 ```