Llama 3.3-70B update for AMD GPU by hyukjlee · Pull Request #212 · vllm-project/recipes

hyukjlee · 2026-01-28T01:59:08Z

Requesting review for the following PR

gemini-code-assist · 2026-01-28T01:59:23Z

Summary of Changes

Hello @hyukjlee, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces new documentation to facilitate the deployment and benchmarking of Meta's Llama 3.1 8B Instruct and Llama 3.3 70B Instruct models on AMD MI300X/MI355X GPUs. These guides provide step-by-step instructions for leveraging vLLM to serve these large language models, enhancing accessibility and performance insights for users with AMD hardware.

Highlights

Llama 3.1 8B Instruct Guide: Introduced a new quick start guide for deploying the Llama 3.1 8B Instruct model on AMD MI300X/MI355X GPUs using vLLM.
Llama 3.3 70B Instruct Guide: Added a comprehensive quick start guide for running the Llama 3.3 70B Instruct model on AMD MI300X/MI355X GPUs, also leveraging vLLM.
Standardized Deployment Workflow: Both guides outline a clear four-step deployment process: using a vLLM Docker image, starting the vLLM online server, running inference with a 'curl' command, and performing performance benchmarks.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds documentation for running Llama 3.1 8B and Llama 3.3 70B models on AMD hardware. The changes are well-structured and provide useful command-line examples. I've provided a few suggestions to improve clarity and consistency in the new markdown files. Specifically, I've pointed out some minor inconsistencies in the listed hardware and suggested improvements to phrasing for better readability.

gemini-code-assist · 2026-01-28T02:00:27Z

Llama/Llama3.1_AMD.md

+
+## Introduction
+
+This quick start recipe explains how to run the Llama 3.1 8B Instruct model on AMD MI300X/MI355X GPUs using vLLM.


For consistency, please consider including the MI325X GPU in this introductory sentence, as it is mentioned in the 'Prerequisites' section below.

Suggested change

This quick start recipe explains how to run the Llama 3.1 8B Instruct model on AMD MI300X/MI355X GPUs using vLLM.

This quick start recipe explains how to run the Llama 3.1 8B Instruct model on AMD MI300X, MI325X, and MI355X GPUs using vLLM.

gemini-code-assist · 2026-01-28T02:00:27Z

Llama/Llama3.1_AMD.md

+
+## Key benefits of AMD GPUs on large models and developers
+
+The AMD Instinct GPUs accelerators are purpose-built to handle the demands of next-gen models like Llama 3.1:


The phrase 'GPUs accelerators' is redundant. Please consider rephrasing to either 'GPU accelerators' or simply 'GPUs' for conciseness and clarity.

Suggested change

The AMD Instinct GPUs accelerators are purpose-built to handle the demands of next-gen models like Llama 3.1:

The AMD Instinct GPU accelerators are purpose-built to handle the demands of next-gen models like Llama 3.1:

gemini-code-assist · 2026-01-28T02:00:28Z

Llama/Llama3.1_AMD.md

+  -tp $TP &
+```
+
+### 3. Running Inference using benchmark script


The title 'Running Inference using benchmark script' is a bit misleading, as this section demonstrates a single inference request rather than running a benchmark script. A title like 'Running a Test Inference' or 'Running a Sample Inference' would be more accurate.

Suggested change

### 3. Running Inference using benchmark script

### 3. Running a Test Inference

gemini-code-assist · 2026-01-28T02:00:28Z

Llama/Llama3.3_70B_AMD.md

+
+## Introduction
+
+This quick start recipe explains how to run the Llama 3.3 70B Instruct model on AMD MI300X/MI355X GPUs using vLLM.


For consistency, please consider including the MI325X GPU in this introductory sentence, as it is mentioned in the 'Prerequisites' section below.

Suggested change

This quick start recipe explains how to run the Llama 3.3 70B Instruct model on AMD MI300X/MI355X GPUs using vLLM.

This quick start recipe explains how to run the Llama 3.3 70B Instruct model on AMD MI300X, MI325X, and MI355X GPUs using vLLM.

gemini-code-assist · 2026-01-28T02:00:28Z

Llama/Llama3.3_70B_AMD.md

+
+## Key benefits of AMD GPUs on large models and developers
+
+The AMD Instinct GPUs accelerators are purpose-built to handle the demands of next-gen models like Llama 3.3:


The phrase 'GPUs accelerators' is redundant. Please consider rephrasing to either 'GPU accelerators' or simply 'GPUs' for conciseness and clarity.

Suggested change

The AMD Instinct GPUs accelerators are purpose-built to handle the demands of next-gen models like Llama 3.3:

The AMD Instinct GPU accelerators are purpose-built to handle the demands of next-gen models like Llama 3.3:

gemini-code-assist · 2026-01-28T02:00:28Z

Llama/Llama3.3_70B_AMD.md

+The AMD Instinct GPUs accelerators are purpose-built to handle the demands of next-gen models like Llama 3.3:
+- Can run large 70B-parameter models with strong throughput on a single node.
+- Massive HBM memory capacity enables support for extended context lengths and larger batch sizes.
+- Using Optimized Triton and AITER kernels provide best-in-class performance and TCO for production deployment.


The phrasing 'Using Optimized Triton...' is a bit awkward for a list item. To improve readability, consider rephrasing to start with a noun or adjective, similar to the other items in the list.

Suggested change

- Using Optimized Triton and AITER kernels provide best-in-class performance and TCO for production deployment.

- Optimized Triton and AITER kernels provide best-in-class performance and TCO for production deployment.

Signed-off-by: hyukjlee <hyukjlee@amd.com>

Signed-off-by: Hyukjoon Lee <hyukjlee@amd.com>

gemini-code-assist bot reviewed Jan 28, 2026

View reviewed changes

Llama3.3-70B update for AMD GPU

2a449a6

Signed-off-by: hyukjlee <hyukjlee@amd.com>

hyukjlee force-pushed the llama_3_3 branch from ca0260a to 2a449a6 Compare January 28, 2026 02:30

hyukjlee added 2 commits February 9, 2026 16:53

Update Llama3.3_70B_AMD.md

2d8abbe

Signed-off-by: Hyukjoon Lee <hyukjlee@amd.com>

Update Llama3.3_70B_AMD.md

f463c16

Signed-off-by: Hyukjoon Lee <hyukjlee@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama 3.3-70B update for AMD GPU#212

Llama 3.3-70B update for AMD GPU#212
hyukjlee wants to merge 3 commits intovllm-project:mainfrom
hyukjlee:llama_3_3

hyukjlee commented Jan 28, 2026

Uh oh!

gemini-code-assist bot commented Jan 28, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 28, 2026

Uh oh!

gemini-code-assist bot Jan 28, 2026

Uh oh!

gemini-code-assist bot Jan 28, 2026

Uh oh!

gemini-code-assist bot Jan 28, 2026

Uh oh!

gemini-code-assist bot Jan 28, 2026

Uh oh!

gemini-code-assist bot Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		## Introduction

		This quick start recipe explains how to run the Llama 3.1 8B Instruct model on AMD MI300X/MI355X GPUs using vLLM.


		## Key benefits of AMD GPUs on large models and developers

		The AMD Instinct GPUs accelerators are purpose-built to handle the demands of next-gen models like Llama 3.1:

	The AMD Instinct GPUs accelerators are purpose-built to handle the demands of next-gen models like Llama 3.1:
	The AMD Instinct GPU accelerators are purpose-built to handle the demands of next-gen models like Llama 3.1:

	### 3. Running Inference using benchmark script
	### 3. Running a Test Inference

	- Using Optimized Triton and AITER kernels provide best-in-class performance and TCO for production deployment.
	- Optimized Triton and AITER kernels provide best-in-class performance and TCO for production deployment.

Conversation

hyukjlee commented Jan 28, 2026

Uh oh!

gemini-code-assist bot commented Jan 28, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant