Skip to content

Update DeepSeek-V3_1.md for AMD GPU#158

Open
haic0 wants to merge 3 commits intovllm-project:mainfrom
haic0:patch-16
Open

Update DeepSeek-V3_1.md for AMD GPU#158
haic0 wants to merge 3 commits intovllm-project:mainfrom
haic0:patch-16

Conversation

@haic0
Copy link
Contributor

@haic0 haic0 commented Dec 11, 2025

No description provided.

Signed-off-by: haic0 <149741444+haic0@users.noreply.github.com>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @haic0, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the documentation for the DeepSeek-V3.1 model by providing comprehensive instructions for deploying and benchmarking it on AMD MI300X GPUs. It outlines the necessary steps for setting up the Docker environment, configuring the vLLM server, and executing performance benchmarks, thereby extending the model's usability to AMD hardware.

Highlights

  • AMD GPU Support Documentation: Added a new section to DeepSeek-V3_1.md detailing the steps required to install and run DeepSeek-V3.1 models on AMD MI300X GPUs.
  • Docker Environment Setup: Provided instructions for pulling and launching the rocm/vllm-dev:nightly Docker image, essential for AMD GPU compatibility.
  • vLLM Server Configuration: Included a sample command for starting the vLLM server with specific environment variables and parameters optimized for DeepSeek-V3.1 on AMD GPUs, such as enabling expert parallelism and setting tensor parallel size.
  • Benchmark Execution Guide: Detailed steps for running a benchmark script within the Docker container to evaluate performance on AMD hardware.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds documentation for running DeepSeek-V3.1 on AMD GPUs. The instructions are clear and cover the necessary steps from environment setup to benchmarking. I've added a couple of suggestions to improve the documentation. One is a high-severity recommendation to add a security warning about the docker run command, which uses privileged flags. The other is a medium-severity suggestion to improve the readability of the instructions for starting the vLLM server. Overall, this is a valuable addition.

```shell
docker pull rocm/vllm-dev:nightly
```
Launch the ROCm vLLM docker:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The docker run command uses --privileged and --security-opt seccomp=unconfined, which grant the container extensive permissions on the host system. This poses a significant security risk. It's important to warn users about this. Consider adding a warning note.

Suggested change
Launch the ROCm vLLM docker:
Launch the ROCm vLLM docker:
> [!WARNING]
> The following command includes `--privileged` and `--security-opt seccomp=unconfined` flags, which grant extensive permissions to the container. Only use this configuration in a trusted development environment.

Comment on lines +158 to +162

Run the vllm online serving


Sample Command
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These lines contain extra newlines and redundant text. For better readability, you can make this section more concise by combining the information into a single sentence.

Suggested change
Run the vllm online serving
Sample Command
Run the following sample command to start the vLLM server:

…1 AMD GPU support

Tested and verified on 8x MI300X:
- vLLM server starts successfully
- Benchmark passed: 16/16 requests, 110 tok/s output throughput

Co-authored-by: billishyahao <bill.he@amd.com>
Co-authored-by: yueyuan <yueyuan@amd.com>
Signed-off-by: Yuan Yue <yueyuan@amd.com>
- Replace Docker-based installation with uv venv approach
- Simplify to essential AMD-specific environment variables only
- Use latest stable ROCm wheel: https://wheels.vllm.ai/rocm/
- Follow Qwen3-Next.md format with Step 1, 2, 3 structure
- Remove unnecessary flags: gpu-memory-utilization, disable-log-requests, etc.

Co-authored-by: billishyahao <bill.he@amd.com>
Co-authored-by: yueyuan <yueyuan@amd.com>
Signed-off-by: Yuan Yue <yueyuan@amd.com>
vllm serve deepseek-ai/DeepSeek-V3.1 \
--tensor-parallel-size 8 \
--enable-expert-parallel \
--served-model-name ds31
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have to remove this --served-model-name argument, it is conflicting with the benchmark command.

Moreover, the vllm serve --model <model path> expect <model path> to be a valid path (either a huggingface hub name, or a local path) where it can get the tokenizer values. So we cannot use custom served-model-name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants