Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions Qwen/Qwen3-Coder-480B-A35B.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,3 +132,59 @@ ERROR [multiproc_executor.py:511] ValueError: The output_size of gate's and up's
- [EvalPlus](https://github.com/evalplus/evalplus)
- [Qwen3-Coder](https://github.com/QwenLM/Qwen3-Coder)
- [vLLM Documentation](https://docs.vllm.ai/)



## AMD GPU Support
Recommended approaches by hardware type are:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haic0 Let's merge all the command together under the same header as CUDA. We shouldn't create whole new section. You can refer to this PR.

https://github.com/vllm-project/recipes/pull/219/changes



MI300X/MI325X/MI355X

Comment on lines +141 to +143
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This section contains a redundant sentence and extra blank lines. The sentence on line 141 is a duplicate of the information in the following list item. Removing it and the surrounding blank lines will make the document cleaner.

Please follow the steps here to install and run Qwen3-Coder models on AMD MI300X/MI325X/MI355X GPU.

### Step 1: Installing vLLM (AMD ROCm Backend: MI300X, MI325X, MI355X)
> Note: The vLLM wheel for ROCm requires Python 3.12, ROCm 7.0, and glibc >= 2.35. If your environment does not meet these requirements, please use the Docker-based setup as described in the [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#pre-built-images).


```bash
uv venv
source .venv/bin/activate
uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/
```


### Step 2: Start the vLLM server

Run the vllm online serving

### BF16


```shell
VLLM_ROCM_USE_AITER=1 vllm serve Qwen/Qwen3-Coder-480B-A35B-Instruct --trust-remote-code --max-model-len 131072 --enable-expert-parallel --data-parallel-size 8 --enable-auto-tool-choice --tool-call-parser qwen3_coder
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it multiline command and be consistent with existing format.

```

### FP8

```shell

VLLM_ROCM_USE_AITER=1 vllm serve Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 --trust-remote-code --max-model-len 131072 --enable-expert-parallel --data-parallel-size 8 --enable-auto-tool-choice --tool-call-parser qwen3_coder
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it multiline command and be consistent with existing format.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haic0 also address this. Make sure in all your PR also setup the command as multiline for better readability.


```


### Step 4: Run Benchmark
Open a new terminal and run the following command to execute the benchmark script inside the container.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haic0 hi, must we use this other instruction? is it fine to use the benchmark instruction specify for the NVIDIA?

Because we should try to group the commands together under the command section like this PR #202
and this page
https://github.com/vllm-project/recipes/blob/main/InternLM/Intern-S1.md


```shell
vllm bench serve \
--backend vllm \
--model Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 \
--endpoint /v1/completions \
--dataset-name random \
--random-input 2048 \
--random-output 1024 \
--max-concurrency 10 \
--num-prompt 100
```
Comment on lines +177 to +190
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The purpose of Step 4 is unclear in relation to Step 3. After starting a server in Step 3, this step describes running what appears to be an offline benchmark, which wouldn't use the running server. To avoid confusion, please clarify if the intention is to benchmark the running server (in which case a client benchmark tool should be used) or to run a separate offline benchmark.