diff --git a/Qwen/Qwen3-Coder-480B-A35B.md b/Qwen/Qwen3-Coder-480B-A35B.md index 37ccbf62..3ce3051c 100644 --- a/Qwen/Qwen3-Coder-480B-A35B.md +++ b/Qwen/Qwen3-Coder-480B-A35B.md @@ -132,3 +132,59 @@ ERROR [multiproc_executor.py:511] ValueError: The output_size of gate's and up's - [EvalPlus](https://github.com/evalplus/evalplus) - [Qwen3-Coder](https://github.com/QwenLM/Qwen3-Coder) - [vLLM Documentation](https://docs.vllm.ai/) + + + +## AMD GPU Support +Recommended approaches by hardware type are: + + +MI300X/MI325X/MI355X + +Please follow the steps here to install and run Qwen3-Coder models on AMD MI300X/MI325X/MI355X GPU. + +### Step 1: Installing vLLM (AMD ROCm Backend: MI300X, MI325X, MI355X) + > Note: The vLLM wheel for ROCm requires Python 3.12, ROCm 7.0, and glibc >= 2.35. If your environment does not meet these requirements, please use the Docker-based setup as described in the [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#pre-built-images). + + + ```bash + uv venv + source .venv/bin/activate + uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/ + ``` + + +### Step 2: Start the vLLM server + +Run the vllm online serving + +### BF16 + + +```shell +VLLM_ROCM_USE_AITER=1 vllm serve Qwen/Qwen3-Coder-480B-A35B-Instruct --trust-remote-code --max-model-len 131072 --enable-expert-parallel --data-parallel-size 8 --enable-auto-tool-choice --tool-call-parser qwen3_coder +``` + +### FP8 + +```shell + +VLLM_ROCM_USE_AITER=1 vllm serve Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 --trust-remote-code --max-model-len 131072 --enable-expert-parallel --data-parallel-size 8 --enable-auto-tool-choice --tool-call-parser qwen3_coder + +``` + + +### Step 4: Run Benchmark +Open a new terminal and run the following command to execute the benchmark script inside the container. + +```shell +vllm bench serve \ + --backend vllm \ + --model Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 \ + --endpoint /v1/completions \ + --dataset-name random \ + --random-input 2048 \ + --random-output 1024 \ + --max-concurrency 10 \ + --num-prompt 100 +```