From 3e56070a1224d3af124e4d9495b8cc18256aebb8 Mon Sep 17 00:00:00 2001 From: haic0 <149741444+haic0@users.noreply.github.com> Date: Thu, 29 Jan 2026 11:49:19 +0800 Subject: [PATCH 1/5] Update Qwen3-Coder-480B-A35B.md for AMD Signed-off-by: haic0 <149741444+haic0@users.noreply.github.com> --- Qwen/Qwen3-Coder-480B-A35B.md | 65 +++++++++++++++++++++++++++++++++++ 1 file changed, 65 insertions(+) diff --git a/Qwen/Qwen3-Coder-480B-A35B.md b/Qwen/Qwen3-Coder-480B-A35B.md index 37ccbf62..24e39d8f 100644 --- a/Qwen/Qwen3-Coder-480B-A35B.md +++ b/Qwen/Qwen3-Coder-480B-A35B.md @@ -132,3 +132,68 @@ ERROR [multiproc_executor.py:511] ValueError: The output_size of gate's and up's - [EvalPlus](https://github.com/evalplus/evalplus) - [Qwen3-Coder](https://github.com/QwenLM/Qwen3-Coder) - [vLLM Documentation](https://docs.vllm.ai/) + + +## AMD GPU Support +Recommended approaches by hardware type are: + + +MI300X/MI325X/MI355X with fp8: Use FP8 checkpoint for optimal memory efficiency. + +- **MI300X/MI325X/MI355X with `fp8`**: Use FP8 checkpoint for optimal memory efficiency. +- **MI300X/MI325X/MI355X with `bfloat16`** + + +Please follow the steps here to install and run Qwen3-Coder models on AMD MI300X/MI325X/MI355X GPU. + +### Step 1: Prepare Docker Environment +Pull the latest vllm docker: +```shell +docker pull vllm/vllm-openai-rocm:v0.14.1 +``` +Launch the ROCm vLLM docker: +```shell +docker run -d -it --entrypoint /bin/bash --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $(pwd):/work -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 --name Qwen3-Coder vllm/vllm-openai-rocm:v0.14.1 +``` +### Step 2: Log in to Hugging Face +Log in to your Hugging Face account: +```shell +hf auth login +``` + +### Step 3: Start the vLLM server + +Run the vllm online serving +```shell +docker exec -it Qwen3-Coder /bin/bash +``` + +### BF16 + + +```shell +VLLM_ROCM_USE_AITER=1 vllm serve Qwen/Qwen3-Coder-480B-A35B-Instruct --trust-remote-code --max-model-len 131072 --enable-expert-parallel --data-parallel-size 8 --enable-auto-tool-choice --tool-call-parser qwen3_coder +``` + +### FP8 + +```shell + +VLLM_ROCM_USE_AITER=1 vllm serve Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 --trust-remote-code --max-model-len 131072 --enable-expert-parallel --data-parallel-size 8 --enable-auto-tool-choice --tool-call-parser qwen3_coder + +``` + + +### Step 4: Run Benchmark +Open a new terminal and run the following command to execute the benchmark script inside the container. +```shell +docker exec -it Qwen3-Coder vllm bench serve \ + --model "Qwen/Qwen3-Coder-480B-A35B-Instruct" \ + --dataset-name random \ + --random-input-len 8192 \ + --random-output-len 1024 \ + --request-rate 10000 \ + --num-prompts 16 \ + --ignore-eos \ + --trust-remote-code +``` From 5a0e67d785f9d79d9afa5ec7b1d13afe2ddf32a3 Mon Sep 17 00:00:00 2001 From: haic0 <149741444+haic0@users.noreply.github.com> Date: Sat, 31 Jan 2026 16:04:44 +0800 Subject: [PATCH 2/5] Update Qwen3-Coder-480B-A35B.md for AMD --- Qwen/Qwen3-Coder-480B-A35B.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Qwen/Qwen3-Coder-480B-A35B.md b/Qwen/Qwen3-Coder-480B-A35B.md index 24e39d8f..88e31046 100644 --- a/Qwen/Qwen3-Coder-480B-A35B.md +++ b/Qwen/Qwen3-Coder-480B-A35B.md @@ -153,7 +153,7 @@ docker pull vllm/vllm-openai-rocm:v0.14.1 ``` Launch the ROCm vLLM docker: ```shell -docker run -d -it --entrypoint /bin/bash --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v $(pwd):/work -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 --name Qwen3-Coder vllm/vllm-openai-rocm:v0.14.1 +docker run -d -it --entrypoint /bin/bash --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /:/work -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 --name Qwen3-Coder vllm/vllm-openai-rocm:v0.14.1 ``` ### Step 2: Log in to Hugging Face Log in to your Hugging Face account: From 724c2b453159c7643d739ee24c0d73f1d52e21b3 Mon Sep 17 00:00:00 2001 From: haic0 <149741444+haic0@users.noreply.github.com> Date: Tue, 3 Feb 2026 23:09:23 +0800 Subject: [PATCH 3/5] Update Qwen3-Coder-480B-A35B.md for AMD --- Qwen/Qwen3-Coder-480B-A35B.md | 36 +++++++++++++---------------------- 1 file changed, 13 insertions(+), 23 deletions(-) diff --git a/Qwen/Qwen3-Coder-480B-A35B.md b/Qwen/Qwen3-Coder-480B-A35B.md index 88e31046..d2077ee8 100644 --- a/Qwen/Qwen3-Coder-480B-A35B.md +++ b/Qwen/Qwen3-Coder-480B-A35B.md @@ -134,39 +134,29 @@ ERROR [multiproc_executor.py:511] ValueError: The output_size of gate's and up's - [vLLM Documentation](https://docs.vllm.ai/) + ## AMD GPU Support Recommended approaches by hardware type are: -MI300X/MI325X/MI355X with fp8: Use FP8 checkpoint for optimal memory efficiency. +MI300X/MI325X/MI355X -- **MI300X/MI325X/MI355X with `fp8`**: Use FP8 checkpoint for optimal memory efficiency. -- **MI300X/MI325X/MI355X with `bfloat16`** +Please follow the steps here to install and run Qwen3-Coder models on AMD MI300X/MI325X/MI355X GPU. +### Step 1: Installing vLLM (AMD ROCm Backend: MI300X, MI325X, MI355X) + > Note: The vLLM wheel for ROCm requires Python 3.12, ROCm 7.0, and glibc >= 2.35. If your environment does not meet these requirements, please use the Docker-based setup as described in the [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#pre-built-images). -Please follow the steps here to install and run Qwen3-Coder models on AMD MI300X/MI325X/MI355X GPU. -### Step 1: Prepare Docker Environment -Pull the latest vllm docker: -```shell -docker pull vllm/vllm-openai-rocm:v0.14.1 -``` -Launch the ROCm vLLM docker: -```shell -docker run -d -it --entrypoint /bin/bash --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /:/work -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 --name Qwen3-Coder vllm/vllm-openai-rocm:v0.14.1 -``` -### Step 2: Log in to Hugging Face -Log in to your Hugging Face account: -```shell -hf auth login -``` + ```bash + uv venv + source .venv/bin/activate + uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/0.14.1/rocm700 + ``` -### Step 3: Start the vLLM server + +### Step 2: Start the vLLM server Run the vllm online serving -```shell -docker exec -it Qwen3-Coder /bin/bash -``` ### BF16 @@ -187,7 +177,7 @@ VLLM_ROCM_USE_AITER=1 vllm serve Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 --trust ### Step 4: Run Benchmark Open a new terminal and run the following command to execute the benchmark script inside the container. ```shell -docker exec -it Qwen3-Coder vllm bench serve \ + vllm bench serve \ --model "Qwen/Qwen3-Coder-480B-A35B-Instruct" \ --dataset-name random \ --random-input-len 8192 \ From d2849c9286b3269634d77fad738250ccf5e5e155 Mon Sep 17 00:00:00 2001 From: haic0 <149741444+haic0@users.noreply.github.com> Date: Fri, 6 Feb 2026 16:52:58 +0800 Subject: [PATCH 4/5] Update Qwen3-Coder-480B-A35B.md for AMD --- Qwen/Qwen3-Coder-480B-A35B.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Qwen/Qwen3-Coder-480B-A35B.md b/Qwen/Qwen3-Coder-480B-A35B.md index d2077ee8..510ff3f2 100644 --- a/Qwen/Qwen3-Coder-480B-A35B.md +++ b/Qwen/Qwen3-Coder-480B-A35B.md @@ -150,7 +150,7 @@ Please follow the steps here to install and run Qwen3-Coder models on AMD MI300X ```bash uv venv source .venv/bin/activate - uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/0.14.1/rocm700 + uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/ ``` From f1650b85c3d7c7103936c67e76584964889d77f6 Mon Sep 17 00:00:00 2001 From: haic0 <149741444+haic0@users.noreply.github.com> Date: Fri, 6 Feb 2026 17:53:20 +0800 Subject: [PATCH 5/5] Update Qwen3-Coder-480B-A35B.md for AMD --- Qwen/Qwen3-Coder-480B-A35B.md | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/Qwen/Qwen3-Coder-480B-A35B.md b/Qwen/Qwen3-Coder-480B-A35B.md index 510ff3f2..3ce3051c 100644 --- a/Qwen/Qwen3-Coder-480B-A35B.md +++ b/Qwen/Qwen3-Coder-480B-A35B.md @@ -176,14 +176,15 @@ VLLM_ROCM_USE_AITER=1 vllm serve Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 --trust ### Step 4: Run Benchmark Open a new terminal and run the following command to execute the benchmark script inside the container. + ```shell - vllm bench serve \ - --model "Qwen/Qwen3-Coder-480B-A35B-Instruct" \ +vllm bench serve \ + --backend vllm \ + --model Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 \ + --endpoint /v1/completions \ --dataset-name random \ - --random-input-len 8192 \ - --random-output-len 1024 \ - --request-rate 10000 \ - --num-prompts 16 \ - --ignore-eos \ - --trust-remote-code + --random-input 2048 \ + --random-output 1024 \ + --max-concurrency 10 \ + --num-prompt 100 ```