Skip to content

Conversation

@Gastron
Copy link
Collaborator

@Gastron Gastron commented Jul 15, 2025

No description provided.

eliecerecology and others added 3 commits July 10, 2025 18:01
…oad for inference on Llama 3.1 8B and 70B

* added the newest version on llm-inference-megatron-lm combined

* Update workloads/llm-inference-megatron-lm/helm/mount/run_megatron.sh

* precommit fixed

* Resolve the last comments in the code

* resolving and testing tokenizer path

* adding the kaiwo comments

* precommits made

---------

Co-authored-by: aivanni <4340981+aivanni@users.noreply.github.com>
* Initial commit, skeleton, data prep

Signed-off-by: Robert Talling <rtalling@amd.com>

* Add Megatron checkpoint conversion details to the tutorial; Add override file for llama-3.1-8B in workloads/download-huggingface-model-to-bucket

Signed-off-by: Robert Talling <rtalling@amd.com>

* Rename tutorial and fix

Signed-off-by: Robert Talling <rtalling@amd.com>

* Fix line splits

Signed-off-by: Robert Talling <rtalling@amd.com>

* fixes and llama 70 overrides for model delivery

Signed-off-by: Robert Talling <rtalling@amd.com>

* Update readme

Signed-off-by: Robert Talling <rtalling@amd.com>

* Add multinode instructions + values

Signed-off-by: Robert Talling <rtalling@amd.com>

* Refine inference workload instructions for Llama-3.1-8B model and update k9s commands

Signed-off-by: Robert Talling <rtalling@amd.com>

* places were Values.yaml were adjusted for testing were reverted

Signed-off-by: Robert Talling <rtalling@amd.com>

* Update workloads/download-huggingface-model-to-bucket/helm/values.yaml

Signed-off-by: Robert Talling <rtalling@amd.com>

* Update docs/tutorials/tutorial-03-deliver-resources-and-run-megatron-cpt.md

Signed-off-by: Robert Talling <rtalling@amd.com>

* Add 16ddp template for multinode, remove explicit namespace mentions

Signed-off-by: Robert Talling <rtalling@amd.com>

* Update tutorial template for llama 8b inference

Signed-off-by: Robert Talling <rtalling@amd.com>

* Fix pre-commit errors

Signed-off-by: Robert Talling <rtalling@amd.com>

* Remove unused overrides and update readme

Signed-off-by: Robert Talling <rtalling@amd.com>

* Update readme

Signed-off-by: Robert Talling <rtalling@amd.com>

* Replace set up section with tutorial-0-prerequisites.

* correction the helm template path for llm-inference-megatron-lm

* resolved override path inside helm, and update README

* Update docs/tutorials/tutorial-03-deliver-resources-and-run-megatron-cpt.md

---------

Signed-off-by: Robert Talling <rtalling@amd.com>
Co-authored-by: Saroosh Shabbir <saroosh.shabbir@amd.com>
Co-authored-by: Robert Talling <rtalling@amd.com>
Co-authored-by: eliecer diaz <eliecerecology@gmail.com>
* jupyterlab: improve documentation with finding correct url and reminding of namespace

* jupyterlab docs: fix namespace
@Gastron Gastron requested review from Brednas and aivanni July 15, 2025 13:10
@alexander-aurell-amd alexander-aurell-amd self-requested a review July 15, 2025 13:12
* vllm0.9 best-known config update

* update api benchmarking scripts to match vllm version

* Add benchmark configuration options for input/output lengths and QPS

* Update container environment variable handling to properly support secret references

* Remove deprecated image references from model configuration files in the LLM inference Helm overrides for DeepSeek, Google Gemma, Meta Llama, Mistral, and Qwen models.

* revert NUMA config for it is read-only

* Remove extra sleep

---------

Co-authored-by: Aku Rouhe <akurouhe@gmail.com>
Copy link
Contributor

@Brednas Brednas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving

@Gastron Gastron merged commit 7844f4d into main Jul 15, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants