Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 86 additions & 0 deletions MiniMax/Minimax-M2_AMD.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# MiniMax M2 on vLLM - AMD Hardware

## Introduction

This quick start recipe explains how to run the MiniMax M2 model on AMD MI300X/MI355X GPUs using vLLM.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The introduction mentions support for MI300X/MI355X GPUs, but the prerequisites section on line 24 also lists MI325X. For consistency, please update the introduction to include all supported GPU models.

Suggested change
This quick start recipe explains how to run the MiniMax M2 model on AMD MI300X/MI355X GPUs using vLLM.
This quick start recipe explains how to run the MiniMax M2 model on AMD MI300X, MI325X, and MI355X GPUs using vLLM.


## Key benefits of AMD GPUs on large models and developers

The AMD Instinct GPUs accelerators are purpose-built to handle the demands of next-gen models like MiniMax M2:
- Large HBM memory enables long-context inference and larger batch sizes.
- Optimized Triton and AITER kernels provide best-in-class performance and TCO for production deployment.

## Access & Licensing

### License and Model parameters

To use the MiniMax M2 model, please check whether you have access to the following model:
- [MiniMax M2](https://huggingface.co/MiniMaxAI/MiniMax-M2)

## Prerequisites

- OS: Linux
- Drivers: ROCm 7.0 or above
- GPU: AMD MI300X, MI325X, and MI355X

## Deployment Steps

### 1. Using vLLM docker image (For AMD users)

```bash
docker run -it \
--network=host \
--device=/dev/kfd \
--device=/dev/dri \
--group-add=video \
--ipc=host \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
--shm-size 32G \
-v /data:/data \
-v $HOME:/myhome \
-w /myhome \
--entrypoint /bin/bash \
vllm/vllm-openai-rocm:latest
```
or you can use uv environment
> Note: The vLLM wheel for ROCm requires Python 3.12, ROCm 7.0, and glibc >= 2.35. If your environment does not meet these requirements, please use the Docker-based setup as described in the [documentation](https://docs.vllm.ai/en/latest/getting_started/installation/gpu/#pre-built-images).
```bash
uv venv
source .venv/bin/activate
uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/
```


### 2. Start vLLM online server (run in background)

```bash
export VLLM_ROCM_USE_AITER=1
vllm serve MiniMaxAI/MiniMax-M2 \
--tensor-parallel-size 4 \
--tool-call-parser minimax_m2 \
--reasoning-parser minimax_m2_append_think \
--enable-auto-tool-choice \
--trust-remote-code \
--disable-log-requests &
```

### 3. Performance benchmark

```bash
export MODEL="MiniMaxAI/MiniMax-M2"
export ISL=1024
export OSL=1024
export REQ=10
export CONC=10
Comment on lines +71 to +75
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve the clarity of the benchmark script, it would be beneficial to add a brief explanation for the environment variables ISL, OSL, REQ, and CONC before the code block. For example:

- `ISL`: Input sequence length
- `OSL`: Output sequence length
- `REQ`: Number of prompts
- `CONC`: Maximum concurrency

vllm bench serve \
--backend vllm \
--model $MODEL \
--dataset-name random \
--random-input-len $ISL \
--random-output-len $OSL \
--num-prompts $REQ \
--ignore-eos \
--max-concurrency $CONC \
--percentile-metrics ttft,tpot,itl,e2el
```