Skip to content

Update Kimi-Linear.md for AMD GPU#155

Open
haic0 wants to merge 2 commits intovllm-project:mainfrom
haic0:patch-13
Open

Update Kimi-Linear.md for AMD GPU#155
haic0 wants to merge 2 commits intovllm-project:mainfrom
haic0:patch-13

Conversation

@haic0
Copy link
Contributor

@haic0 haic0 commented Dec 10, 2025

No description provided.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @haic0, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the documentation for the Kimi-Linear model by adding a comprehensive guide for its deployment and benchmarking on AMD MI300X GPUs. It covers the entire setup process, from preparing the Docker environment with ROCm vLLM to initiating the vLLM server and running performance tests, thereby broadening the model's accessibility and operational scope for users with AMD hardware.

Highlights

  • AMD GPU Support Documentation: A new section titled 'AMD GPU Support' has been added to the Kimi-Linear.md file, providing instructions for running the model on AMD MI300X GPUs.
  • Docker Environment Setup: Detailed steps are included for preparing the Docker environment, specifically pulling the rocm/vllm-dev:nightly image and launching a container with appropriate ROCm and vLLM configurations.
  • vLLM Server Configuration: Instructions for logging into Hugging Face and starting the vLLM server are provided, including specific environment variables (SAFETENSORS_FAST_GPU=1, VLLM_USE_V1=1, VLLM_USE_TRITON_FLASH_ATTN=0) and model parameters for moonshotai/Kimi-Linear-48B-A3B-Instruct.
  • Benchmark Execution: A command to run a benchmark script inside the Docker container is included, allowing users to test the performance of the Kimi-Linear model on their AMD GPU setup.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds instructions for running the Kimi-Linear model on AMD GPUs. The new documentation is clear and provides step-by-step guidance. My review includes a few suggestions to improve the readability and formatting of the commands and text, such as breaking up long commands, removing unnecessary blank lines, and clarifying instructions. These changes will help ensure the documentation is easy for users to follow.

Comment on lines +65 to +66
Run the vllm online serving
Sample Command
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The section heading is already 'Start the vLLM server'. The text here is a bit redundant and could be more concise.

Suggested change
Run the vllm online serving
Sample Command
Run the following command to start the vLLM server:

Comment on lines +94 to +96



Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These trailing blank lines and whitespace at the end of the file can be removed for better formatting.

@jeejeelee
Copy link
Collaborator

TBH, I'm questioning whether it's truly necessary to differentiate between AMD and NVIDIA GPUs when running these models on vLLM. Looking at most of your pull requests, aside from the Docker image, there doesn't appear to be anything particularly distinctive

Signed-off-by: haic0 <149741444+haic0@users.noreply.github.com>

Update moonshotai/Kimi-Linear.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: haic0 <149741444+haic0@users.noreply.github.com>

Update moonshotai/Kimi-Linear.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: haic0 <149741444+haic0@users.noreply.github.com>

Update vLLM ROCm Docker image and run commands

Signed-off-by: jiacao-amd <jiahui.cao@amd.com>

add uv launch support

Signed-off-by: jiacao-amd <jiahui.cao@amd.com>
@jiacao-amd jiacao-amd force-pushed the patch-13 branch 3 times, most recently from cbf2886 to f37c30a Compare February 6, 2026 07:01
Merged ROCm installation and running instructions from separate AMD GPU
Support section into main content with CUDA/ROCm subheaders for better
organization and consistency.

Signed-off-by: jiacao-amd <jiahui.cao@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants