Update main 241025 #7

alexander-aurell-amd · 2025-10-24T14:48:37Z

No description provided.

* GH action created for copying files to docs repository * empty new line added to pass the pre-commit test

* Add clean gencast & pangu inference workload * reformatted with black * clean comments * rm old definition of ephemeral storage * update readme * remove notes.txt * kaiwo disable by default * silogen minio setup * Added instructions for cds api key generation * minor changes to the readme * rm unwanted pull secrets * create a bucket if does not exist * change exception to s3 * handle exception * add header

* add initial working version * style: apply pre-commit fixes * fix path trailing space * add trailing space to the patch * update readme * initial commit for aurora * rename workload * fix minor bugs * add model options to readme * fix dependencies * fix minor bug * change visualizer output path * change visualizer output path * remove 12h sleep from entrypoint command * add imput date param * update readme * remove notes.txt * unwanted import * change ephemeral storage & resource configuration * move job definition around * trailing whitespace formatted * Add CDS API key instructions * minor changes to readme * Fix minor bug * disable kaiwo by default * minio setup for silogen * remove unwanted param * remove img pull secret * fix bracket bug * try to create a bucket * handle exception * add header

* examples dir * overrides * instructions * WIP test run on cluster * tested on cluster * folder rename * reference the intended workload in each override * move OPI example to tutorials * move docs/tutorials/tutorial-05-finetune-llama8b-custom-domain-data.md overrides under corresponding workloads * remove examples dir * rename FT override * rename override * update override names and locations * rename to specify tutorial 05 * update paths, add hf token * fix modelPath in workload override * fix path for inference deployment --------- Co-authored-by: Saroosh Shabbir <saroosh.shabbir@amd.com>

* Add imagePullSecrets parameter to JupyterLab and VSCode workloads * Add imagePullSecrets parameter to MLFlow and ComfyUI workloads

* add swinunetr training ai workload --------- Co-authored-by: eliecer diaz <eliecerecology@gmail.com>

* Include tracking config in the configmap + example override * Update values.schema.json and finetuning config markdown * Example override for MLFlow

* adds xlstm inference * removes support files * adds corrections for deployment errors * removes the xlstm-inference-vllm workload * adds xlstm-inference-torchserve workload * adds README and test query * clean-up * Fixes issues from PR review * Modify model loading * Update torchserve handler * Update model loading and env vars * Add README.md * Corrections * Add prerequisites to README.md * Update workloads/xlstm-inference-torchserve/helm/values.yaml Co-authored-by: aivanni <4340981+aivanni@users.noreply.github.com> * Update workloads/xlstm-inference-torchserve/README.md Co-authored-by: aivanni <4340981+aivanni@users.noreply.github.com> * Update workloads/xlstm-inference-torchserve/helm/mount/entrypoint.sh Co-authored-by: aivanni <4340981+aivanni@users.noreply.github.com> * Add query script for xlstm workload * Corrections to torchserve handler * Correction based on PR comments * ADD TEMP env var for steering torchserve tmp storage * Corrects config to camelCase based on PR review * Update workloads/xlstm-inference-torchserve/helm/mount/entrypoint.sh Co-authored-by: aivanni <4340981+aivanni@users.noreply.github.com> * Update * Update --------- Co-authored-by: aivanni <4340981+aivanni@users.noreply.github.com>

* feat(torchserve): add files inital commit * chore: change directory name * feat(torchserve): add readme * chore(torchserve): update readme with an example * refactor(torchserve) * refactor(torchserve): apply pre-commit * refactor(torchserve) * feat(torchserve): update wan handler request * refactor(torchserve): move readme into helm directory * remove extra file * debug model_setup.sh * Pin numpy version to fix model loading failures * Trim debugging in model_setup.sh * update /tmp/ for /workload/ paths * update readme to include torchserve model packager * add tutorial override * add package and serve tutorial * update workload paths * remove platform-specific info --------- Co-authored-by: Saroosh Shabbir <saroosh.shabbir@amd.com>

* Push initial workload version * Add instructions. * Add dynamic arguments. * Add model and data downloading from bucket. * Add uploading checkpoints. * Add entry points for workflow. * Add training configs. * Add templating to data paths. * Add media finetuning tutorial --------- Co-authored-by: Saroosh Shabbir <saroosh.shabbir@amd.com>

* add readme draft * add Wan2.1-VACE diffusers model overrides * update readme * add override * update VACE 1.3b diffusers model override * add xlstm override * Fix workload to conform to standard; Change workload name to be more explicit. * update overrides based on values standards * update readme * remove extra values * update model package paths * remove minio installation from entrypoints and add to job args * linting * Fix remote bucket path for storage * update minio-host * remove bucketStorageHost * update workdir path * update readme * add tutorial override * Update workloads/torchserve-model-packager/helm/values.yaml Co-authored-by: aivanni <4340981+aivanni@users.noreply.github.com> * Update workloads/torchserve-model-packager/helm/templates/job.yaml Co-authored-by: aivanni <4340981+aivanni@users.noreply.github.com> --------- Co-authored-by: johannayang-amd <johayang@amd.com> Co-authored-by: Kristoffer Peyron <krpeyron@amd.com> Co-authored-by: Saroosh Shabbir <saroosh.shabbir@amd.com> Co-authored-by: aivanni <4340981+aivanni@users.noreply.github.com>

OpenSplat workflow added

* Initial commit. * Initial commit. * Update docker image. * Update the docker image to the new minified version. * Fix pre-commit issues. * Fix pre-commit issues. * Other small fixes. * Bugfixes. Add OSSCI overrides. * Misc. bugfixes. * Add an override for running on OSSCI. * Fix no-user errors preventing running on OSSCI. * Cleanup for OSSCI test run. * Cleanup repo for OSSCI test run. * Cleanup before PR. * Cleanup before PR submission. * Switch to the Silogen image. * Change the image import to the newly pushed image in the Silogen package registry. * Remove OSSCI override. Change image. * Removed OSSCI override file. * Changed the workload to use the new weatherbench-preprocessor.dockerfile image that will be shared by both preprocessor workloads. * Update template * Remove TODO. Remove resolved TODO. * Make variables & pressure levels selectable. * Make surface variables, vertical variables and pressure levels selectable via values. * Fix readme and minio ssl * Fix precommit --------- Co-authored-by: Robert Talling <rtalling@amd.com>

* Initial commit. * Initial commit. * Fix input arrays. * Fix helm interpolation of the array of input files. * Change README. * Last edits to README.md. * Pre-commit fixes. * Pre-commit autofixes. * Update helpers, values and readme --------- Co-authored-by: Robert Talling <rtalling@amd.com>

* Initial commit. * Initial commit. * Add support for user selected metrics and variables. * Add support for user selected metrics and variables. * Update helpers --------- Co-authored-by: Robert Talling <rtalling@amd.com>

REFM-447 * REFM-425: added original clipora and workload stub * refm-425: improved vlm lora training and visualization * vlm lora finetune workload, cleaned up code * improved vlm lora finetune workload * added ephemeral storage option vlm lora finetune * fixed all changes recommended * improved vlm lora readme, template naming changes * lora finetuning workload cleanup, added docker image files to repo * fixed whitespace * example dataset parsing small change * added imagepullsecrets support to vlm lora finetune job * Update workloads/vlm-lora-finetune/helm/mount/prepare_custom_dataset.py Co-authored-by: aivanni <4340981+aivanni@users.noreply.github.com> * clipora from submodule to regular files * vlm lora finetunin camel case, removed pvc option * fixed styling, import and some code in vlm lora finetune * pinned vlm lora docker requirements * vlm lora clipora docker improve dataloader * vlm lora: polished README, added workers to example config, styling fixes * vlm lora: added comment to exmaple train config * vlm lora: use /workload dir, fixed paths --------- Co-authored-by: mwessman <michael.wessman@amd.com> Co-authored-by: aivanni <4340981+aivanni@users.noreply.github.com>

* DPO Example in silogen finetuning workload * DPO Documentation * Update the SFT and DPO configs * Fix typos * Remove unused parameters * Update mkdocs

thbergst82 and others added 19 commits September 8, 2025 09:01

GH action created for copying files to docs repository (#342)

bf73782

* GH action created for copying files to docs repository * empty new line added to pass the pre-commit test

Add imagePullSecrets parameter to JupyterLab and VSCode workloads (#449)

3e671ae

* Add imagePullSecrets parameter to JupyterLab and VSCode workloads * Add imagePullSecrets parameter to MLFlow and ComfyUI workloads

Refm life science swinunetr training (#407)

522c2cb

* add swinunetr training ai workload --------- Co-authored-by: eliecer diaz <eliecerecology@gmail.com>

Enable MLflow in Silogen Finetuning Engine (#454)

e391695

* Include tracking config in the configmap + example override * Update values.schema.json and finetuning config markdown * Example override for MLFlow

Refm robotics job for OpenSplat (#311)

361a25e

OpenSplat workflow added

Weatherbench runner (#446)

2f1a687

* Initial commit. * Initial commit. * Add support for user selected metrics and variables. * Add support for user selected metrics and variables. * Update helpers --------- Co-authored-by: Robert Talling <rtalling@amd.com>

Add toctree to the workloads overview page (#457)

122e34f

Added two new life science workloads for reinvent and semlaflow

79eefde

DPO Example in silogen finetuning workload (#458)

a6bfcef

* DPO Example in silogen finetuning workload * DPO Documentation * Update the SFT and DPO configs * Fix typos * Remove unused parameters * Update mkdocs

alexander-aurell-amd requested review from Gastron and markvanheeswijk October 24, 2025 14:48

Gastron approved these changes Oct 24, 2025

View reviewed changes

markvanheeswijk approved these changes Oct 24, 2025

View reviewed changes

alexander-aurell-amd merged commit 06a921d into main Oct 24, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update main 241025 #7

Update main 241025 #7

Uh oh!

alexander-aurell-amd commented Oct 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants

Update main 241025 #7

Update main 241025 #7

Uh oh!

Conversation

alexander-aurell-amd commented Oct 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants