Conversation
Signed-off-by: hyukjlee <hyukjlee@amd.com>
Summary of ChangesHello @hyukjlee, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a comprehensive guide for deploying the MiniMax M2 model on AMD MI300X/MI355X GPUs using the vLLM framework. The new documentation outlines the necessary prerequisites, step-by-step deployment procedures, and methods for both inference and performance benchmarking, specifically tailored for AMD hardware, aiming to facilitate easier adoption and evaluation of the model on these accelerators. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a helpful guide for running the MiniMax-M2 model on AMD GPUs. The instructions are generally clear and easy to follow. My review includes a few suggestions to enhance accuracy and clarity, such as correcting a Docker image tag, improving consistency in the documentation, and clarifying section headers and script variables. Additionally, to ensure users can find this new guide, please consider adding a link to it in the main README.md file.
MiniMax/Minimax-M2_AMD.md
Outdated
|
|
||
| ```bash | ||
| alias drun='sudo docker run -it --network=host --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --shm-size 32G -v /data:/data -v $HOME:/myhome -w /myhome --entrypoint /bin/bash' | ||
| drun vllm/vllm-openai-rocm:v0.14.1 |
There was a problem hiding this comment.
The Docker image tag vllm/vllm-openai-rocm:v0.14.1 appears to be incorrect as it doesn't exist on Docker Hub. This will cause the command to fail. Please use a valid tag. The latest available tag at the time of this review is v0.5.1.
| drun vllm/vllm-openai-rocm:v0.14.1 | |
| drun vllm/vllm-openai-rocm:v0.5.1 |
|
|
||
| ## Introduction | ||
|
|
||
| This quick start recipe explains how to run the MiniMax M2 model on AMD MI300X/MI355X GPUs using vLLM. |
There was a problem hiding this comment.
The introduction mentions support for MI300X/MI355X GPUs, but the prerequisites section on line 24 also lists MI325X. For consistency, please update the introduction to include all supported GPU models.
| This quick start recipe explains how to run the MiniMax M2 model on AMD MI300X/MI355X GPUs using vLLM. | |
| This quick start recipe explains how to run the MiniMax M2 model on AMD MI300X, MI325X, and MI355X GPUs using vLLM. |
MiniMax/Minimax-M2_AMD.md
Outdated
| --port 8007 & | ||
| ``` | ||
|
|
||
| ### 3. Running Inference using benchmark script |
There was a problem hiding this comment.
The section title "Running Inference using benchmark script" is a bit misleading, as this section demonstrates a single inference request with curl, while the next section covers benchmarking. Consider renaming it to "Running Inference" for better clarity.
| ### 3. Running Inference using benchmark script | |
| ### 3. Running Inference |
| export MODEL="MiniMaxAI/MiniMax-M2" | ||
| export ISL=1024 | ||
| export OSL=1024 | ||
| export REQ=10 | ||
| export CONC=10 |
There was a problem hiding this comment.
To improve the clarity of the benchmark script, it would be beneficial to add a brief explanation for the environment variables ISL, OSL, REQ, and CONC before the code block. For example:
- `ISL`: Input sequence length
- `OSL`: Output sequence length
- `REQ`: Number of prompts
- `CONC`: Maximum concurrencySigned-off-by: Hyukjoon Lee <hyukjlee@amd.com>
Signed-off-by: Hyukjoon Lee <hyukjlee@amd.com>
|
@hyukjlee we should update https://github.com/vllm-project/recipes/blob/main/MiniMax/MiniMax-M2.md instead of opening a new file. Create a subsections like this https://github.com/vllm-project/recipes/pull/219/changes#r2804656457 for each header. And is there a special need to use the benchmark configuration that you used? Is it also representative to use the benchmark workload from https://github.com/vllm-project/recipes/blob/main/MiniMax/MiniMax-M2.md instead of proposing a new scenario? |
No description provided.