Skip to content

Minimax-M2 update for AMD GPU#215

Open
hyukjlee wants to merge 3 commits intovllm-project:mainfrom
hyukjlee:minimax-m2
Open

Minimax-M2 update for AMD GPU#215
hyukjlee wants to merge 3 commits intovllm-project:mainfrom
hyukjlee:minimax-m2

Conversation

@hyukjlee
Copy link

No description provided.

Signed-off-by: hyukjlee <hyukjlee@amd.com>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @hyukjlee, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive guide for deploying the MiniMax M2 model on AMD MI300X/MI355X GPUs using the vLLM framework. The new documentation outlines the necessary prerequisites, step-by-step deployment procedures, and methods for both inference and performance benchmarking, specifically tailored for AMD hardware, aiming to facilitate easier adoption and evaluation of the model on these accelerators.

Highlights

  • New Documentation for AMD GPU Deployment: A new Markdown file, MiniMax/Minimax-M2_AMD.md, has been added, providing a quick start guide for running the MiniMax M2 model on AMD MI300X/MI355X GPUs using vLLM.
  • Detailed Deployment Steps: The guide includes comprehensive instructions for setting up the vLLM docker image, starting the vLLM online server with AMD-specific configurations (e.g., VLLM_ROCM_USE_AITER=1, tensor-parallel-size 4), and running inference.
  • Performance Benchmarking: Instructions and a script for performance benchmarking using vllm bench serve are provided, allowing users to evaluate the model's throughput and latency on AMD hardware.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a helpful guide for running the MiniMax-M2 model on AMD GPUs. The instructions are generally clear and easy to follow. My review includes a few suggestions to enhance accuracy and clarity, such as correcting a Docker image tag, improving consistency in the documentation, and clarifying section headers and script variables. Additionally, to ensure users can find this new guide, please consider adding a link to it in the main README.md file.


```bash
alias drun='sudo docker run -it --network=host --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --shm-size 32G -v /data:/data -v $HOME:/myhome -w /myhome --entrypoint /bin/bash'
drun vllm/vllm-openai-rocm:v0.14.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The Docker image tag vllm/vllm-openai-rocm:v0.14.1 appears to be incorrect as it doesn't exist on Docker Hub. This will cause the command to fail. Please use a valid tag. The latest available tag at the time of this review is v0.5.1.

Suggested change
drun vllm/vllm-openai-rocm:v0.14.1
drun vllm/vllm-openai-rocm:v0.5.1


## Introduction

This quick start recipe explains how to run the MiniMax M2 model on AMD MI300X/MI355X GPUs using vLLM.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The introduction mentions support for MI300X/MI355X GPUs, but the prerequisites section on line 24 also lists MI325X. For consistency, please update the introduction to include all supported GPU models.

Suggested change
This quick start recipe explains how to run the MiniMax M2 model on AMD MI300X/MI355X GPUs using vLLM.
This quick start recipe explains how to run the MiniMax M2 model on AMD MI300X, MI325X, and MI355X GPUs using vLLM.

--port 8007 &
```

### 3. Running Inference using benchmark script
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The section title "Running Inference using benchmark script" is a bit misleading, as this section demonstrates a single inference request with curl, while the next section covers benchmarking. Consider renaming it to "Running Inference" for better clarity.

Suggested change
### 3. Running Inference using benchmark script
### 3. Running Inference

Comment on lines +68 to +72
export MODEL="MiniMaxAI/MiniMax-M2"
export ISL=1024
export OSL=1024
export REQ=10
export CONC=10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve the clarity of the benchmark script, it would be beneficial to add a brief explanation for the environment variables ISL, OSL, REQ, and CONC before the code block. For example:

- `ISL`: Input sequence length
- `OSL`: Output sequence length
- `REQ`: Number of prompts
- `CONC`: Maximum concurrency

Signed-off-by: Hyukjoon Lee <hyukjlee@amd.com>
Signed-off-by: Hyukjoon Lee <hyukjlee@amd.com>
@tjtanaa
Copy link

tjtanaa commented Feb 13, 2026

@hyukjlee we should update https://github.com/vllm-project/recipes/blob/main/MiniMax/MiniMax-M2.md instead of opening a new file.

Create a subsections like this https://github.com/vllm-project/recipes/pull/219/changes#r2804656457 for each header. And is there a special need to use the benchmark configuration that you used? Is it also representative to use the benchmark workload from https://github.com/vllm-project/recipes/blob/main/MiniMax/MiniMax-M2.md instead of proposing a new scenario?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants