Minimax-M2 update for AMD GPU by hyukjlee · Pull Request #215 · vllm-project/recipes

hyukjlee · 2026-01-28T03:37:25Z

No description provided.

Signed-off-by: hyukjlee <hyukjlee@amd.com>

gemini-code-assist · 2026-01-28T03:37:37Z

Summary of Changes

Hello @hyukjlee, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive guide for deploying the MiniMax M2 model on AMD MI300X/MI355X GPUs using the vLLM framework. The new documentation outlines the necessary prerequisites, step-by-step deployment procedures, and methods for both inference and performance benchmarking, specifically tailored for AMD hardware, aiming to facilitate easier adoption and evaluation of the model on these accelerators.

Highlights

New Documentation for AMD GPU Deployment: A new Markdown file, MiniMax/Minimax-M2_AMD.md, has been added, providing a quick start guide for running the MiniMax M2 model on AMD MI300X/MI355X GPUs using vLLM.
Detailed Deployment Steps: The guide includes comprehensive instructions for setting up the vLLM docker image, starting the vLLM online server with AMD-specific configurations (e.g., VLLM_ROCM_USE_AITER=1, tensor-parallel-size 4), and running inference.
Performance Benchmarking: Instructions and a script for performance benchmarking using vllm bench serve are provided, allowing users to evaluate the model's throughput and latency on AMD hardware.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a helpful guide for running the MiniMax-M2 model on AMD GPUs. The instructions are generally clear and easy to follow. My review includes a few suggestions to enhance accuracy and clarity, such as correcting a Docker image tag, improving consistency in the documentation, and clarifying section headers and script variables. Additionally, to ensure users can find this new guide, please consider adding a link to it in the main README.md file.

gemini-code-assist · 2026-01-28T03:38:35Z

MiniMax/Minimax-M2_AMD.md

+
+```bash
+alias drun='sudo docker run -it --network=host --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --shm-size 32G -v /data:/data -v $HOME:/myhome -w /myhome --entrypoint /bin/bash'
+drun vllm/vllm-openai-rocm:v0.14.1


The Docker image tag vllm/vllm-openai-rocm:v0.14.1 appears to be incorrect as it doesn't exist on Docker Hub. This will cause the command to fail. Please use a valid tag. The latest available tag at the time of this review is v0.5.1.

Suggested change

drun vllm/vllm-openai-rocm:v0.14.1

drun vllm/vllm-openai-rocm:v0.5.1

gemini-code-assist · 2026-01-28T03:38:35Z

MiniMax/Minimax-M2_AMD.md

+
+## Introduction
+
+This quick start recipe explains how to run the MiniMax M2 model on AMD MI300X/MI355X GPUs using vLLM.


The introduction mentions support for MI300X/MI355X GPUs, but the prerequisites section on line 24 also lists MI325X. For consistency, please update the introduction to include all supported GPU models.

Suggested change

This quick start recipe explains how to run the MiniMax M2 model on AMD MI300X/MI355X GPUs using vLLM.

This quick start recipe explains how to run the MiniMax M2 model on AMD MI300X, MI325X, and MI355X GPUs using vLLM.

gemini-code-assist · 2026-01-28T03:38:35Z

MiniMax/Minimax-M2_AMD.md

+  --port 8007 &
+```
+
+### 3. Running Inference using benchmark script


The section title "Running Inference using benchmark script" is a bit misleading, as this section demonstrates a single inference request with curl, while the next section covers benchmarking. Consider renaming it to "Running Inference" for better clarity.

Suggested change

### 3. Running Inference using benchmark script

### 3. Running Inference

gemini-code-assist · 2026-01-28T03:38:35Z

MiniMax/Minimax-M2_AMD.md

+export MODEL="MiniMaxAI/MiniMax-M2"
+export ISL=1024
+export OSL=1024
+export REQ=10
+export CONC=10


To improve the clarity of the benchmark script, it would be beneficial to add a brief explanation for the environment variables ISL, OSL, REQ, and CONC before the code block. For example:

- `ISL`: Input sequence length - `OSL`: Output sequence length - `REQ`: Number of prompts - `CONC`: Maximum concurrency

Signed-off-by: Hyukjoon Lee <hyukjlee@amd.com>

tjtanaa · 2026-02-13T15:11:55Z

@hyukjlee we should update https://github.com/vllm-project/recipes/blob/main/MiniMax/MiniMax-M2.md instead of opening a new file.

Create a subsections like this https://github.com/vllm-project/recipes/pull/219/changes#r2804656457 for each header. And is there a special need to use the benchmark configuration that you used? Is it also representative to use the benchmark workload from https://github.com/vllm-project/recipes/blob/main/MiniMax/MiniMax-M2.md instead of proposing a new scenario?

Minimax-M2 update for AMD GPU

53ec7f1

Signed-off-by: hyukjlee <hyukjlee@amd.com>

gemini-code-assist bot reviewed Jan 28, 2026

View reviewed changes

hyukjlee added 2 commits February 9, 2026 16:55

Update Minimax-M2_AMD.md

a0b043b

Signed-off-by: Hyukjoon Lee <hyukjlee@amd.com>

Update Minimax-M2_AMD.md

612e9c2

Signed-off-by: Hyukjoon Lee <hyukjlee@amd.com>

tjtanaa mentioned this pull request Feb 13, 2026

Mistral-3-Instruct update for AMD GPU #198

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minimax-M2 update for AMD GPU#215

Minimax-M2 update for AMD GPU#215
hyukjlee wants to merge 3 commits intovllm-project:mainfrom
hyukjlee:minimax-m2

hyukjlee commented Jan 28, 2026

Uh oh!

gemini-code-assist bot commented Jan 28, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 28, 2026

Uh oh!

gemini-code-assist bot Jan 28, 2026

Uh oh!

gemini-code-assist bot Jan 28, 2026

Uh oh!

gemini-code-assist bot Jan 28, 2026

Uh oh!

tjtanaa commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	drun vllm/vllm-openai-rocm:v0.14.1
	drun vllm/vllm-openai-rocm:v0.5.1


		## Introduction

		This quick start recipe explains how to run the MiniMax M2 model on AMD MI300X/MI355X GPUs using vLLM.

	### 3. Running Inference using benchmark script
	### 3. Running Inference

Conversation

hyukjlee commented Jan 28, 2026

Uh oh!

gemini-code-assist bot commented Jan 28, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

tjtanaa commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants