Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions DeepSeek/DeepSeek-V3_1.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,3 +135,51 @@ curl http://localhost:8000/v1/chat/completions \
}
}'
```


## AMD GPU Support
Recommended approaches by hardware type are:

MI300X/MI325X/MI355X

Please follow the steps here to install and run DeepSeek-V3.1 models on AMD MI300X/MI325X/MI355X GPU.

### Step 1: Installing vLLM (AMD ROCm Backend: MI300X, MI325X, MI355X)
> Note: The vLLM wheel for ROCm requires Python 3.12, ROCm 7.0, and glibc >= 2.35. If your environment does not meet these requirements, please use the Docker-based setup as described in the [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#pre-built-images).
```bash
uv venv
source .venv/bin/activate
uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/
```


### Step 2: Start the vLLM server

Run the vllm online serving


```bash
SAFETENSORS_FAST_GPU=1 \
VLLM_USE_TRITON_FLASH_ATTN=0 \
VLLM_ROCM_USE_AITER=1 \
vllm serve deepseek-ai/DeepSeek-V3.1 \
--tensor-parallel-size 8 \
--enable-expert-parallel \
--served-model-name ds31
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have to remove this --served-model-name argument, it is conflicting with the benchmark command.

Moreover, the vllm serve --model <model path> expect <model path> to be a valid path (either a huggingface hub name, or a local path) where it can get the tokenizer values. So we cannot use custom served-model-name.

```


### Step 3: Run Benchmark
Open a new terminal and run the following command to execute the benchmark script.
```bash
vllm bench serve \
--model "deepseek-ai/DeepSeek-V3.1" \
--dataset-name random \
--random-input-len 8192 \
--random-output-len 1024 \
--request-rate 10000 \
--num-prompts 16 \
--ignore-eos \
--trust-remote-code
```