Add MiniMax-M2.5 Usage Guide by faradawn · Pull Request #257 · vllm-project/recipes

faradawn · 2026-02-21T01:38:38Z

Add detailed configs for different deployments

Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>

gemini-code-assist · 2026-02-21T01:38:50Z

Summary of Changes

Hello @faradawn, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new, detailed usage guide for integrating the MiniMax-M2.5 model with vLLM. The guide provides essential steps for setting up the environment, deploying the model with various configurations, and evaluating its performance through benchmarking, aiming to streamline the adoption and utilization of the model.

Highlights

New Usage Guide Added: A comprehensive guide for running the MiniMax-M2.5 model with vLLM has been added, covering installation, deployment, and benchmarking procedures.
vLLM Installation Instructions: Detailed steps for installing vLLM via pip (including a note on nightly builds for potential corrupted output) and Docker are now available.
Model Deployment Configurations: Specific Docker commands for deploying MiniMax-M2.5 are provided, including recommended GPU setups (e.g., 4x H200/H20 or A100/A800) and a configuration for B200 (FP8).
Benchmarking Script and Expected Output: A script for benchmarking MiniMax-M2.5 performance using vllm bench serve is included, along with an example of the expected output metrics.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

MiniMax/MiniMax-M2.5.md
- Added a new usage guide for MiniMax-M2.5 with vLLM.

Activity

No human activity has been recorded on this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request introduces a new usage guide for MiniMax-M2.5. However, the current draft contains significant redundancy with the existing MiniMax/MiniMax-M2.md and lacks the 'detailed configs' promised in the description. The guide includes duplicate Docker commands and missing hardware-specific optimizations for B200. I recommend consolidating this information into the existing comprehensive guide or expanding this file with unique, optimized deployment configurations and verified benchmark data.

gemini-code-assist · 2026-02-21T01:40:11Z

MiniMax/MiniMax-M2.5.md

+# MiniMax-M2.5 Usage Guide
+
+This guide describes how to run [MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) with vLLM.


This new guide significantly overlaps with the existing MiniMax/MiniMax-M2.md, which already covers MiniMax-M2.5 and provides more comprehensive details such as system requirements, advanced parallelism (DP/EP), and verified benchmarks. Consider merging any unique M2.5-specific information into the existing guide instead of creating a separate file to avoid documentation fragmentation and maintenance overhead.

gemini-code-assist · 2026-02-21T01:40:11Z

MiniMax/MiniMax-M2.5.md

+
+## Running MiniMax-M2.5
+
+MiniMax-M2.5 can be run on different GPU configurations. The recommended setup uses 4x H200/H20 or 4x A100/A800 GPUs with tensor parallelism.


To fulfill the goal of providing 'detailed configs for different deployments', it would be beneficial to include examples for Data Parallelism (DP) and Expert Parallelism (EP). Since pure TP8 is not supported for this model, providing the DP8+EP or TP+EP commands is crucial for users scaling beyond 4 GPUs.

gemini-code-assist · 2026-02-21T01:40:12Z

MiniMax/MiniMax-M2.5.md

+docker run --gpus all \
+  -p 8000:8000 \
+  --ipc=host \
+  -v ~/.cache/huggingface:/root/.cache/huggingface \
+  vllm/vllm-openai:nightly MiniMaxAI/MiniMax-M2.5 \
+      --tensor-parallel-size 4 \
+      --tool-call-parser minimax_m2 \
+      --reasoning-parser minimax_m2_append_think \
+      --enable-auto-tool-choice \
+      --trust-remote-code


This Docker command is identical to the one provided in the installation section (lines 19-29). For a specific 'B200 (FP8)' deployment, it should include the necessary environment variables (e.g., VLLM_USE_FLASHINFER_MOE_FP8=0) to address known compatibility issues on this hardware, as documented in the general MiniMax guide.

Suggested change

docker run --gpus all \

-p 8000:8000 \

--ipc=host \

-v ~/.cache/huggingface:/root/.cache/huggingface \

vllm/vllm-openai:nightly MiniMaxAI/MiniMax-M2.5 \

--tensor-parallel-size 4 \

--tool-call-parser minimax_m2 \

--reasoning-parser minimax_m2_append_think \

--enable-auto-tool-choice \

--trust-remote-code

docker run --gpus all \

-e VLLM_USE_FLASHINFER_MOE_FP8=0 \

-p 8000:8000 \

--ipc=host \

-v ~/.cache/huggingface:/root/.cache/huggingface \

vllm/vllm-openai:nightly MiniMaxAI/MiniMax-M2.5 \

--tensor-parallel-size 4 \

--tool-call-parser minimax_m2 \

--reasoning-parser minimax_m2_append_think \

--enable-auto-tool-choice \

--trust-remote-code

gemini-code-assist · 2026-02-21T01:40:12Z

MiniMax/MiniMax-M2.5.md

+============ Serving Benchmark Result ============
+Successful requests:                     xxx
+Failed requests:                         xxx
+Maximum request concurrency:             xxx
+Benchmark duration (s):                  xxx
+Total input tokens:                      xxx
+Total generated tokens:                  xxx
+Request throughput (req/s):              xxx
+Output token throughput (tok/s):         xxx
+Peak output token throughput (tok/s):    xxx
+Peak concurrent requests:                xxx
+Total Token throughput (tok/s):          xxx
+---------------Time to First Token----------------
+Mean TTFT (ms):                          xxx
+Median TTFT (ms):                        xxx
+P99 TTFT (ms):                           xxx
+-----Time per Output Token (excl. 1st token)------
+Mean TPOT (ms):                          xxx
+Median TPOT (ms):                        xxx
+P99 TPOT (ms):                           xxx
+---------------Inter-token Latency----------------
+Mean ITL (ms):                           xxx
+Median ITL (ms):                         xxx
+P99 ITL (ms):                            xxx


The benchmark results are currently empty placeholders (xxx). Providing actual representative performance data or removing this section until verified metrics are available would improve the guide's utility and credibility.

jeejeelee · 2026-02-27T12:07:17Z

We have added this model in https://github.com/vllm-project/recipes/blob/main/MiniMax/MiniMax-M2.md

functionstackx · 2026-03-02T02:55:34Z

hi @kedarpotdar-nv @faradawn for h100 seeing that TP4EP4 is the best recipe verus TP8EP8

-enable-expert-parallel --tensor-parallel-size=4

can u add this to ur PR instead of me starting another PR?

Add MiniMax-M2.5 usage guide

96831af

Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>

gemini-code-assist bot reviewed Feb 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MiniMax-M2.5 Usage Guide#257

Add MiniMax-M2.5 Usage Guide#257
faradawn wants to merge 1 commit intovllm-project:mainfrom
faradawn:minimax-2.5-recipe

faradawn commented Feb 21, 2026

Uh oh!

gemini-code-assist bot commented Feb 21, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 21, 2026

Uh oh!

gemini-code-assist bot Feb 21, 2026

Uh oh!

gemini-code-assist bot Feb 21, 2026

Uh oh!

gemini-code-assist bot Feb 21, 2026

Uh oh!

jeejeelee commented Feb 27, 2026

Uh oh!

functionstackx commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		# MiniMax-M2.5 Usage Guide

		This guide describes how to run [MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) with vLLM.


		## Running MiniMax-M2.5

		MiniMax-M2.5 can be run on different GPU configurations. The recommended setup uses 4x H200/H20 or 4x A100/A800 GPUs with tensor parallelism.

Conversation

faradawn commented Feb 21, 2026

Uh oh!

gemini-code-assist bot commented Feb 21, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

jeejeelee commented Feb 27, 2026

Uh oh!

functionstackx commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants