-
Notifications
You must be signed in to change notification settings - Fork 0
Update main #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Update main #2
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Silogen finetuning through helm * Remove imagePullSecret from example as it's not needed * Refactor values.yaml and example configuration for clarity and updated paths --------- Co-authored-by: Mark van Heeswijk <mark.vanheeswijk@amd.com>
* parameterise model download url * flux sdxl overrides * convert entrypoint to template * rename overrides * update readme * add flux1-schnell and sdxl-base overlays * fix: rename * update model config * Add ingress/http_route templates * improve model tag config * one more model config * readme update ingress --------- Co-authored-by: Mark van Heeswijk <mark.vanheeswijk@amd.com>
* Add HTTP route and ingress configuration with schema updates * Add support for configurable replicas in deployment and schema
* use rocm/pytroch image for faster deployment * let comfy-cli handle requirements
* WIP: overrides for Qwen2.5-3B-Instruct model for interference. * Removed -debug from image names.
* judge download: pull in helpers.tpl and main template changes * judge download: working. Ugly dual bucket_storage_host fix to have different protocols in different containers * judge download: pre-commit test * judge download: pre-commit * judge download: add protocol in helpers instead of two storage hosts * judge download: move dataset protocol trimming from k8s to package code * judge download: move dataset protocol trimming from k8s to package code2
* added mutate_manifest.py script that can be used to wrap some resources with Kaiwo equivalents * added RayService and fixed RayJob spec_key in mutate_manifest.py
* Add Ray based Megatron-LM workload chart Signed-off-by: Robert Talling <rtalling@amd.com> Co-authored-by: aivanni <4340981+aivanni@users.noreply.github.com>
Signed-off-by: Robert Talling <rtalling@amd.com>
* Add Helm chart for MLflow tracking server deployment (user/project) * mlflow readme update * Enhance db configuration with secret management * update readme * readme update regarding url prefix * update to remove url prefix setting * fix s3 store issue and set default to local artifact store * improve minio settings and set default artifacts to S3 again * refactor: update README and scripts for artifact storage configuration and usage instructions * add MinIO S3-compatible storage configuration for MLflow artifacts * final touch before merge
* add readme for infinity embedding workload * remove whitespace * add overrides
* Add model and data loading from minio * Add deepspeed config example * Validate starting of sync process, escape and quote path argument * Wait 1s before checking if sync process started * Fix for quotes in checkpointsRemote * Update readme and other edits for clarity * Update workloads/llm-finetune-llama-factory/helm/README.md Co-authored-by: Aku Rouhe <akurouhe@amd.com> --------- Co-authored-by: Aku Rouhe <akurouhe@amd.com>
* veRL GRPO finetuning ROCm example workload * Refactor and complete VeRL workload * Fix comments and typos --------- Co-authored-by: Emil Eirola <emil.eirola@amd.com>
* Add basic MLFlow export * Upgrade ROCm image. * Fix nested folders for artifacts on MLFlow EVEN BETTER! * Fix extra f-string --------- Co-authored-by: Sander Bijl de Vroe <Sander.BijldeVroe@amd.com>
…rmonise model names. Harmonise arg parser. (#354) * Fix erroneously removed LLM client URL prefix.
* Quote paths and escape chars in mc mirror * Fix handling of minio paths * Fix handling of quotes in echo statements
* Add on-boarding documentation for pre-commit * clarify cd in docs and fix <br /> * small edit
* WandB downloader * Make it work * Correct override name, always mount * No ephemeral storage, just emptyDir
Brednas
approved these changes
Jul 8, 2025
Contributor
Brednas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved as discussed in daily
sarooshsh
approved these changes
Jul 8, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Bring latest changes from development repository here.