-
Notifications
You must be signed in to change notification settings - Fork 0
Update main 241025 #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* GH action created for copying files to docs repository * empty new line added to pass the pre-commit test
* Add clean gencast & pangu inference workload * reformatted with black * clean comments * rm old definition of ephemeral storage * update readme * remove notes.txt * kaiwo disable by default * silogen minio setup * Added instructions for cds api key generation * minor changes to the readme * rm unwanted pull secrets * create a bucket if does not exist * change exception to s3 * handle exception * add header
* add initial working version * style: apply pre-commit fixes * fix path trailing space * add trailing space to the patch * update readme * initial commit for aurora * rename workload * fix minor bugs * add model options to readme * fix dependencies * fix minor bug * change visualizer output path * change visualizer output path * remove 12h sleep from entrypoint command * add imput date param * update readme * remove notes.txt * unwanted import * change ephemeral storage & resource configuration * move job definition around * trailing whitespace formatted * Add CDS API key instructions * minor changes to readme * Fix minor bug * disable kaiwo by default * minio setup for silogen * remove unwanted param * remove img pull secret * fix bracket bug * try to create a bucket * handle exception * add header
* examples dir * overrides * instructions * WIP test run on cluster * tested on cluster * folder rename * reference the intended workload in each override * move OPI example to tutorials * move docs/tutorials/tutorial-05-finetune-llama8b-custom-domain-data.md overrides under corresponding workloads * remove examples dir * rename FT override * rename override * update override names and locations * rename to specify tutorial 05 * update paths, add hf token * fix modelPath in workload override * fix path for inference deployment --------- Co-authored-by: Saroosh Shabbir <saroosh.shabbir@amd.com>
* Add imagePullSecrets parameter to JupyterLab and VSCode workloads * Add imagePullSecrets parameter to MLFlow and ComfyUI workloads
* add swinunetr training ai workload --------- Co-authored-by: eliecer diaz <eliecerecology@gmail.com>
* Include tracking config in the configmap + example override * Update values.schema.json and finetuning config markdown * Example override for MLFlow
* adds xlstm inference * removes support files * adds corrections for deployment errors * removes the xlstm-inference-vllm workload * adds xlstm-inference-torchserve workload * adds README and test query * clean-up * Fixes issues from PR review * Modify model loading * Update torchserve handler * Update model loading and env vars * Add README.md * Corrections * Add prerequisites to README.md * Update workloads/xlstm-inference-torchserve/helm/values.yaml Co-authored-by: aivanni <4340981+aivanni@users.noreply.github.com> * Update workloads/xlstm-inference-torchserve/README.md Co-authored-by: aivanni <4340981+aivanni@users.noreply.github.com> * Update workloads/xlstm-inference-torchserve/helm/mount/entrypoint.sh Co-authored-by: aivanni <4340981+aivanni@users.noreply.github.com> * Add query script for xlstm workload * Corrections to torchserve handler * Correction based on PR comments * ADD TEMP env var for steering torchserve tmp storage * Corrects config to camelCase based on PR review * Update workloads/xlstm-inference-torchserve/helm/mount/entrypoint.sh Co-authored-by: aivanni <4340981+aivanni@users.noreply.github.com> * Update * Update --------- Co-authored-by: aivanni <4340981+aivanni@users.noreply.github.com>
* feat(torchserve): add files inital commit * chore: change directory name * feat(torchserve): add readme * chore(torchserve): update readme with an example * refactor(torchserve) * refactor(torchserve): apply pre-commit * refactor(torchserve) * feat(torchserve): update wan handler request * refactor(torchserve): move readme into helm directory * remove extra file * debug model_setup.sh * Pin numpy version to fix model loading failures * Trim debugging in model_setup.sh * update /tmp/ for /workload/ paths * update readme to include torchserve model packager * add tutorial override * add package and serve tutorial * update workload paths * remove platform-specific info --------- Co-authored-by: Saroosh Shabbir <saroosh.shabbir@amd.com>
* Push initial workload version * Add instructions. * Add dynamic arguments. * Add model and data downloading from bucket. * Add uploading checkpoints. * Add entry points for workflow. * Add training configs. * Add templating to data paths. * Add media finetuning tutorial --------- Co-authored-by: Saroosh Shabbir <saroosh.shabbir@amd.com>
* add readme draft * add Wan2.1-VACE diffusers model overrides * update readme * add override * update VACE 1.3b diffusers model override * add xlstm override * Fix workload to conform to standard; Change workload name to be more explicit. * update overrides based on values standards * update readme * remove extra values * update model package paths * remove minio installation from entrypoints and add to job args * linting * Fix remote bucket path for storage * update minio-host * remove bucketStorageHost * update workdir path * update readme * add tutorial override * Update workloads/torchserve-model-packager/helm/values.yaml Co-authored-by: aivanni <4340981+aivanni@users.noreply.github.com> * Update workloads/torchserve-model-packager/helm/templates/job.yaml Co-authored-by: aivanni <4340981+aivanni@users.noreply.github.com> --------- Co-authored-by: johannayang-amd <johayang@amd.com> Co-authored-by: Kristoffer Peyron <krpeyron@amd.com> Co-authored-by: Saroosh Shabbir <saroosh.shabbir@amd.com> Co-authored-by: aivanni <4340981+aivanni@users.noreply.github.com>
OpenSplat workflow added
* Initial commit. * Initial commit. * Update docker image. * Update the docker image to the new minified version. * Fix pre-commit issues. * Fix pre-commit issues. * Other small fixes. * Bugfixes. Add OSSCI overrides. * Misc. bugfixes. * Add an override for running on OSSCI. * Fix no-user errors preventing running on OSSCI. * Cleanup for OSSCI test run. * Cleanup repo for OSSCI test run. * Cleanup before PR. * Cleanup before PR submission. * Switch to the Silogen image. * Change the image import to the newly pushed image in the Silogen package registry. * Remove OSSCI override. Change image. * Removed OSSCI override file. * Changed the workload to use the new weatherbench-preprocessor.dockerfile image that will be shared by both preprocessor workloads. * Update template * Remove TODO. Remove resolved TODO. * Make variables & pressure levels selectable. * Make surface variables, vertical variables and pressure levels selectable via values. * Fix readme and minio ssl * Fix precommit --------- Co-authored-by: Robert Talling <rtalling@amd.com>
* Initial commit. * Initial commit. * Fix input arrays. * Fix helm interpolation of the array of input files. * Change README. * Last edits to README.md. * Pre-commit fixes. * Pre-commit autofixes. * Update helpers, values and readme --------- Co-authored-by: Robert Talling <rtalling@amd.com>
* Initial commit. * Initial commit. * Add support for user selected metrics and variables. * Add support for user selected metrics and variables. * Update helpers --------- Co-authored-by: Robert Talling <rtalling@amd.com>
REFM-447 * REFM-425: added original clipora and workload stub * refm-425: improved vlm lora training and visualization * vlm lora finetune workload, cleaned up code * improved vlm lora finetune workload * added ephemeral storage option vlm lora finetune * fixed all changes recommended * improved vlm lora readme, template naming changes * lora finetuning workload cleanup, added docker image files to repo * fixed whitespace * example dataset parsing small change * added imagepullsecrets support to vlm lora finetune job * Update workloads/vlm-lora-finetune/helm/mount/prepare_custom_dataset.py Co-authored-by: aivanni <4340981+aivanni@users.noreply.github.com> * clipora from submodule to regular files * vlm lora finetunin camel case, removed pvc option * fixed styling, import and some code in vlm lora finetune * pinned vlm lora docker requirements * vlm lora clipora docker improve dataloader * vlm lora: polished README, added workers to example config, styling fixes * vlm lora: added comment to exmaple train config * vlm lora: use /workload dir, fixed paths --------- Co-authored-by: mwessman <michael.wessman@amd.com> Co-authored-by: aivanni <4340981+aivanni@users.noreply.github.com>
* DPO Example in silogen finetuning workload * DPO Documentation * Update the SFT and DPO configs * Fix typos * Remove unused parameters * Update mkdocs
Gastron
approved these changes
Oct 24, 2025
markvanheeswijk
approved these changes
Oct 24, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.