Conversation
Summary of ChangesHello @hyukjlee, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces new documentation to facilitate the deployment and benchmarking of Meta's Llama 3.1 8B Instruct and Llama 3.3 70B Instruct models on AMD MI300X/MI355X GPUs. These guides provide step-by-step instructions for leveraging vLLM to serve these large language models, enhancing accessibility and performance insights for users with AMD hardware. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request adds documentation for running Llama 3.1 8B and Llama 3.3 70B models on AMD hardware. The changes are well-structured and provide useful command-line examples. I've provided a few suggestions to improve clarity and consistency in the new markdown files. Specifically, I've pointed out some minor inconsistencies in the listed hardware and suggested improvements to phrasing for better readability.
Llama/Llama3.1_AMD.md
Outdated
|
|
||
| ## Introduction | ||
|
|
||
| This quick start recipe explains how to run the Llama 3.1 8B Instruct model on AMD MI300X/MI355X GPUs using vLLM. |
There was a problem hiding this comment.
For consistency, please consider including the MI325X GPU in this introductory sentence, as it is mentioned in the 'Prerequisites' section below.
| This quick start recipe explains how to run the Llama 3.1 8B Instruct model on AMD MI300X/MI355X GPUs using vLLM. | |
| This quick start recipe explains how to run the Llama 3.1 8B Instruct model on AMD MI300X, MI325X, and MI355X GPUs using vLLM. |
Llama/Llama3.1_AMD.md
Outdated
|
|
||
| ## Key benefits of AMD GPUs on large models and developers | ||
|
|
||
| The AMD Instinct GPUs accelerators are purpose-built to handle the demands of next-gen models like Llama 3.1: |
There was a problem hiding this comment.
The phrase 'GPUs accelerators' is redundant. Please consider rephrasing to either 'GPU accelerators' or simply 'GPUs' for conciseness and clarity.
| The AMD Instinct GPUs accelerators are purpose-built to handle the demands of next-gen models like Llama 3.1: | |
| The AMD Instinct GPU accelerators are purpose-built to handle the demands of next-gen models like Llama 3.1: |
Llama/Llama3.1_AMD.md
Outdated
| -tp $TP & | ||
| ``` | ||
|
|
||
| ### 3. Running Inference using benchmark script |
There was a problem hiding this comment.
The title 'Running Inference using benchmark script' is a bit misleading, as this section demonstrates a single inference request rather than running a benchmark script. A title like 'Running a Test Inference' or 'Running a Sample Inference' would be more accurate.
| ### 3. Running Inference using benchmark script | |
| ### 3. Running a Test Inference |
|
|
||
| ## Introduction | ||
|
|
||
| This quick start recipe explains how to run the Llama 3.3 70B Instruct model on AMD MI300X/MI355X GPUs using vLLM. |
There was a problem hiding this comment.
For consistency, please consider including the MI325X GPU in this introductory sentence, as it is mentioned in the 'Prerequisites' section below.
| This quick start recipe explains how to run the Llama 3.3 70B Instruct model on AMD MI300X/MI355X GPUs using vLLM. | |
| This quick start recipe explains how to run the Llama 3.3 70B Instruct model on AMD MI300X, MI325X, and MI355X GPUs using vLLM. |
|
|
||
| ## Key benefits of AMD GPUs on large models and developers | ||
|
|
||
| The AMD Instinct GPUs accelerators are purpose-built to handle the demands of next-gen models like Llama 3.3: |
There was a problem hiding this comment.
The phrase 'GPUs accelerators' is redundant. Please consider rephrasing to either 'GPU accelerators' or simply 'GPUs' for conciseness and clarity.
| The AMD Instinct GPUs accelerators are purpose-built to handle the demands of next-gen models like Llama 3.3: | |
| The AMD Instinct GPU accelerators are purpose-built to handle the demands of next-gen models like Llama 3.3: |
| The AMD Instinct GPUs accelerators are purpose-built to handle the demands of next-gen models like Llama 3.3: | ||
| - Can run large 70B-parameter models with strong throughput on a single node. | ||
| - Massive HBM memory capacity enables support for extended context lengths and larger batch sizes. | ||
| - Using Optimized Triton and AITER kernels provide best-in-class performance and TCO for production deployment. |
There was a problem hiding this comment.
The phrasing 'Using Optimized Triton...' is a bit awkward for a list item. To improve readability, consider rephrasing to start with a noun or adjective, similar to the other items in the list.
| - Using Optimized Triton and AITER kernels provide best-in-class performance and TCO for production deployment. | |
| - Optimized Triton and AITER kernels provide best-in-class performance and TCO for production deployment. |
Signed-off-by: hyukjlee <hyukjlee@amd.com>
Signed-off-by: Hyukjoon Lee <hyukjlee@amd.com>
Signed-off-by: Hyukjoon Lee <hyukjlee@amd.com>
Requesting review for the following PR