[CI] Add CK (independent of miopen) full build and test to TheRock.#3379
[CI] Add CK (independent of miopen) full build and test to TheRock.#3379brockhargreaves-amd wants to merge 17 commits intomainfrom
Conversation
geomin12
left a comment
There was a problem hiding this comment.
i'm also adding a label so this PR only runs composablekernel tests (no need to use other resources)
There was a problem hiding this comment.
we generally like to keep submodule bumps as separate PRs (so we can determine / revert if any breaks occur)
let's remove this :) not sure if this was an accident inclusion
There was a problem hiding this comment.
I definitely didn't do this on purpose and I have not had my hands in rocm_kpack at all. I'm not sure how this happened. The Pull Request told me my branch was out of date so I hit the button to update it, could that be what pulled this in?
| "composablekernel": { | ||
| "job_name": "composablekernel", | ||
| "fetch_artifact_args": "--composablekernel --tests", | ||
| "timeout_minutes": 60, |
There was a problem hiding this comment.
i see that the timeout is 60 mins and sharding is 4, however, i don't see any sharding added in test_composablekernel.py. can we correct this to 1 shard? and a smaller timeout value?
There was a problem hiding this comment.
Sure, is it okay if we correct to 1 shard and keep the timeout as is? It took 48 minutes to compile locally, unless the machines we're compiling on are doing it faster?
| logging.basicConfig(level=logging.INFO) | ||
|
|
||
| cmd = [ | ||
| "test_ck_tile_pooling", |
There was a problem hiding this comment.
just want to confirm, is this the only test CK provides? :)
There was a problem hiding this comment.
It's definitely not! But I don't have enough input from stakeholders yet on which tests we should be adding. I'm just hoping to get an end to end solution here and then we can start evolving the test filter.
geomin12
left a comment
There was a problem hiding this comment.
also if the build is done / working, no need to wait for builds!
you can follow this: https://github.com/ROCm/TheRock/blob/main/docs/development/ci_behavior_manipulation.md#workflow-dispatch-behavior
to trigger tests only using the build's run id
Fix typo. Co-authored-by: Geo Min <geomin12@amd.com>
Co-authored-by: Geo Min <geomin12@amd.com>
## Motivation MIOpen tests on windows gfx1151 take long time that exceeds the expected time per shard. Some shards took around 50 mins and usually well over 30 mins. Disabling these tests will free up some resources for other projects to run on this limited architecture. ## Technical Details From other runs on PR, I gathered these info on the longest running tests: Test case | Time shard 1 | Time shard 2 | Time shard 3 | Time shard 4 | Total time -- | -- | -- | -- | -- | -- Full/GPU_Softmax_FP32 | 7:50 | 8:12 | 6:53 | 3:12 | ~26 mins Full/GPU_Softmax_BFP16 | 4:00 | 3:59 | 3:27 | 1:32 | ~13 mins Full/GPU_Softmax_FP16 | 3:39 | 3:42 | 2:53 | 1:13 | ~11.5 mins Smoke/GPU_Reduce_FP32 | 2:18 | 0:52 | 0:00 | 3:14 | ~6:24 mins With team feedback, these can be excluded from the rock CI and moved into nightly without increasing the risk. ## Test Plan Monitor the test run time of the shards on windows gfx1151 and compare to previous results. an example run on 4 shards takes: 27m - 36m - 34m - 36m we should see a big decrease in run time. ## Test Result Old run times: 27m - 36m - 34m - 36m New run times: 11m, 17m, 12m, 18m ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
## Motivation This PR fixes a missing backslash (\\) in the workflow configuration that was causing file upload failures to the S3 bucket. bug introduced here: https://github.com/ROCm/TheRock/pull/2280/changes#r2793460592 ## File changed - `.github/workflows/build_portable_linux_pytorch_wheels.yml` ## Technical Details - Added the missing backslash (\\) in the affected workflow file. ## Test Plan - https://github.com/ROCm/TheRock/actions/runs/21912800899 ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
…3303) ## Motivation Progress on #3291. This adds a new `build_portable_linux_pytorch_wheels_ci.yml` workflow forked from [`build_portable_linux_pytorch_wheels.yml`](https://github.com/ROCm/TheRock/blob/main/.github/workflows/build_portable_linux_pytorch_wheels.yml). This new workflow is run as part of our CI pipeline and will help catch when changes to ROCm break PyTorch source builds. Future work will expand this to also build other packages, upload the built packages to S3, and run tests. This workflow code would have caught the build break reported at #3042. ## Technical Details > [!NOTE] > See #3291 and https://github.com/ScottTodd/claude-rocm-workspace/blob/main/tasks/active/pytorch-ci.md for other design considerations. I'm starting with a narrow scope here to provide _some_ value without blowing our budget or delaying while we refactor related workflows and infrastructure code (e.g. moving index page generation server-side, generating commit manifests at the _start_ of workflows instead of computing them after the fact and plumbing them through partway through the jobs) Specifics: * Linux only (as a start) * Non-configurable, always runs (as a start) * Included for all GPU architectures where `expect_pytorch_failure` is not set * Python 3.12 (not full matrix) * PyTorch release/2.10 branch (not full matrix) * Only builds 'torch', not 'torchaudio', 'torchvision', 'triton', or other packages * Does not upload packages yet * Does not run tests yet (beyond package sanity checks that `import torch` works on the build machine) The build jobs add about 30 minutes of CI time per GPU architecture, and we are not currently using ccache or sccache (#3171 will change that) ## Test Plan * Tested on a known-broken commit (4497f66) * https://github.com/ROCm/TheRock/actions/runs/21768200125/job/62810358116 (failed as expected) * Test on a known-working commit (a001047) * https://github.com/ROCm/TheRock/actions/runs/21768071862/job/62813030260 (passed as expected) * CI jobs on this PR itself, e.g. https://github.com/ROCm/TheRock/actions/runs/21846117572/job/63050058601?pr=3303 ``` [41](https://github.com/ROCm/TheRock/actions/runs/21846117572/job/63049474316?pr=3303#step:11:78642) Found built wheel: /__w/TheRock/TheRock/external-builds/pytorch/pytorch/dist/torch-2.10.0+devrocm7.12.0.dev0.09ac57fcd4e7258046fff2824dc0614384cb1c85-cp312-cp312-linux_x86_64.whl ++ Copy /__w/TheRock/TheRock/external-builds/pytorch/pytorch/dist/torch-2.10.0+devrocm7.12.0.dev0.09ac57fcd4e7258046fff2824dc0614384cb1c85-cp312-cp312-linux_x86_64.whl -> /home/runner/_work/TheRock/TheRock/output/packages/dist +++ Installing built torch: ++ Exec [/tmp]$ /opt/python/cp312-cp312/bin/python -m pip install /__w/TheRock/TheRock/external-builds/pytorch/pytorch/dist/torch-2.10.0+devrocm7.12.0.dev0.09ac57fcd4e7258046fff2824dc0614384cb1c85-cp312-cp312-linux_x86_64.whl +++ Sanity checking installed torch (unavailable is okay on CPU machines): ++ Capture [/tmp]$ /opt/python/cp312-cp312/bin/python -c 'import torch; print(torch.cuda.is_available())' Sanity check output: False --- Not build pytorch-audio (no --pytorch-audio-dir) --- Not build pytorch-vision (no --pytorch-vision-dir) --- Not build apex (no --apex-dir) --- Builds all completed ``` ``` Valid wheel: torch-2.10.0+devrocm7.12.0.dev0.09ac57fcd4e7258046fff2824dc0614384cb1c85-cp312-cp312-linux_x86_64.whl (222812153 bytes) ``` ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --------- Co-authored-by: Claude <noreply@anthropic.com>
## Motivation Following up on #3303 (comment). The pip cache is useful for local rebuilds but is not providing much value on CI builds using ephemeral VMs running inside docker containers. ## Technical Details The `pip cache remove rocm` command fails under the manylinux docker container if an alternate cache directory is not provided: ``` Building PyTorch wheels for gfx94X-dcgpu WARNING: The directory '/github/home/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag. ERROR: pip cache commands can not function since cache is disabled. Traceback (most recent call last): File "/__w/TheRock/TheRock/./external-builds/pytorch/build_prod_wheels.py", line 1078, in <module> ++ Exec [/__w/TheRock/TheRock]$ /opt/python/cp312-cp312/bin/python -m pip cache remove rocm_sdk main(sys.argv[1:]) File "/__w/TheRock/TheRock/./external-builds/pytorch/build_prod_wheels.py", line 1074, in main args.func(args) File "/__w/TheRock/TheRock/./external-builds/pytorch/build_prod_wheels.py", line 348, in do_build do_install_rocm(args) File "/__w/TheRock/TheRock/./external-builds/pytorch/build_prod_wheels.py", line 302, in do_install_rocm run_command( File "/__w/TheRock/TheRock/./external-builds/pytorch/build_prod_wheels.py", line 175, in run_command subprocess.check_call(args, cwd=str(cwd), env=full_env) File "/opt/python/cp312-cp312/lib/python3.12/subprocess.py", line 413, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/opt/python/cp312-cp312/bin/python', '-m', 'pip', 'cache', 'remove', 'rocm_sdk']' returned non-zero exit status 1. Error: Process completed with exit code 1. ``` Also fixed a small bug from #735 where `--cache-dir` was added twice to the `pip install` command ## Test Plan Test workflow run: https://github.com/ROCm/TheRock/actions/runs/21916360386/job/63284560496 ## Test Result ``` Building PyTorch wheels for gfx94X-dcgpu WARNING: The directory '/github/home/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag. ERROR: pip cache commands can not function since cache is disabled. WARNING: The directory '/github/home/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag. Looking in indexes: https://rocm.devreleases.amd.com/v2/gfx94X-dcgpu/ Collecting rocm==7.12.0a20260211 (from rocm[devel,libraries]==7.12.0a20260211) Downloading https://rocm.devreleases.amd.com/v2/gfx94X-dcgpu/rocm-7.12.0a20260211.tar.gz (16 kB) ``` (I think stdout/stderr output are not being interleaved fully, so the new warning won't appear until later in the logs?) ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --------- Co-authored-by: Claude <noreply@anthropic.com>
## Motivation Following up on #3303 (comment). Progress on #3291. Now that we can build pytorch as part of CI workflows, changes to at least a few files under [`external-builds/pytorch/`](https://github.com/ROCm/TheRock/tree/main/external-builds/pytorch) should no longer be excluded. ## Technical Details * This broadens the scope to also include [`external-builds/jax/`](https://github.com/ROCm/TheRock/tree/main/external-builds/jax) and [`external-builds/uccl/`](https://github.com/ROCm/TheRock/tree/main/external-builds/uccl). Changes to those directories are infrequent and they should aim to get included in CI workflows too. * The initial CI integration only builds torch, not torchaudio, torchvision, apex, etc. It also does not run tests. We could set finer-grained filters until that's all integrated, but we can at least work around extra job triggers by using the `skip-ci` label (https://github.com/ROCm/TheRock/blob/main/docs/development/ci_behavior_manipulation.md) * Along the lines of the deleted comment, changes to _just_ the pytorch scripts will still build all of ROCm first. For that, I think we could do either: * Optimize our null build (zero source files changed) to be faster * Detect when _only_ pytorch sources are changed and set the `*_use_prebuilt_artifacts` options using some automatic choice of artifacts (https://github.com/ROCm/TheRock/blob/main/build_tools/find_latest_artifacts.py or another baseline) ## Test Plan Existing unit tests. ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --------- Co-authored-by: Claude <noreply@anthropic.com>
…h packages (#3355) ## Motivation I'm looking at building off of this `expect_pytorch_failure` code in `build_tools/github_actions/amdgpu_family_matrix.py` and I wanted to check if we could remove some of the xfails now. Net changes: * Disable aotriton for gfx101X * Enable Windows gfx101X pytorch releases * Keep Linux gfx101X pytorch releases disabled until CK works there (might disable on Windows later, once CK is enabled on Windows) * Enable aotriton for gfx1152 and gfx1153 for pytorch versions >= 2.11 * Share `AOTRITON_UNSUPPORTED_ARCHS` between Windows and Linux to make further changes here easier to make uniformly ## Technical Details ### gfx101X history * #1925 * #1926 * #2106 * #3164 ### gfx1152 and gfx1153 history * #2310 * #2709 * #2810 "Add gfx1152 and gfx1153 iGPU support" landed in aotriton as ROCm/aotriton@5cc0b2d. PyTorch 'nightly' now includes pytorch/pytorch@b356c81, bumping aotriton to 0.11.2b. I don't see equivalent commits [yet] on any of the release branches in https://github.com/ROCm/pytorch. ## Test Plan and Results Platform | amdgpu family | PyTorch ref | Workflow logs | Result -- | -- | -- | -- | -- Linux | gfx1152 | nightly | https://github.com/ROCm/TheRock/actions/runs/21877171000 | Passed Linux | gfx1153 | nightly | https://github.com/ROCm/TheRock/actions/runs/21877172940 | Passed Linux | gfx101X-dgpu | release/2.10 | https://github.com/ROCm/TheRock/actions/runs/21876725870 | Failed (#1926), so keeping it disabled Windows | gfx101X-dgpu | release/2.10 | https://github.com/ROCm/TheRock/actions/runs/21876893200 | Failed (#3311), unrelated issue so enabling it Dev release of Linux gfx101X: https://github.com/ROCm/TheRock/actions/runs/21881959462 (passed, did not trigger pytorch) ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
) and (#2877) (#3305) ## Motivation Closes #2954 This pull request re-introduces the relocation of the amdsmi subproject from the base/ directory into the core/ directory of the TheRock repository. This move was originally proposed in PR #2188 and #2877 but were reverted due to test failures exposed in CI and local runs. #2188 - Was failure on `rocm-sdk test` #2877 - Was failure on `Multi-Arch` workflow ## Technical Details 1. `amdsmi` sources and build targets were relocated from `base/amdsmi` → `core/amdsmi`. 2. Packaging updates were applied so that `amd-smi` console script entry points and runtime artifacts are properly included in the Python wheel produced by TheRock’s packaging logic. 3. `core-amdsmi` artifacts are included in the `rocm-sdk-core` package and validation tests exercise `amd-smi` as expected. 4. Fixed the multi-arch build failures by making `amd-smi` an explicit core artifact (`core-amdsmi`) in the build topology and aligning CMake dependencies to treat it as a required/conditional dependency where appropriate. This removed ordering ambiguities and “missing target” errors across math-libs, profiler, and comm-libs, allowing multi-arch builds to resolve dependencies deterministically. ## Testing `rocm-sdk test` now passes the `testConsoleScripts` check for `amd-smi`, which was previously failing with a non-zero exit status. See the details: #2954 ## CI: ### A: Release Workflows Tests 1) Triggered `Release portable Linux packages (gfx94X,gfx110X,gfx950,gfx1151,gfx120X, dev)` from [this ](#3305) PR's branch https://github.com/ROCm/TheRock/actions/runs/21836558832 and it is successful. 2) This automatically triggered workflows below: [Release portable Linux PyTorch Wheels (gfx120X-all, dev, 7.12.0.dev0+ab23b96387a5c79111438ea936764dc353773834) #2363 ](https://github.com/ROCm/TheRock/actions/runs/21846577066) [Release portable Linux PyTorch Wheels (gfx950-dcgpu, dev, 7.12.0.dev0+ab23b96387a5c79111438ea936764dc353773834) #2364 ](https://github.com/ROCm/TheRock/actions/runs/21852030796) [Release portable Linux PyTorch Wheels (gfx1151, dev, 7.12.0.dev0+ab23b96387a5c79111438ea936764dc353773834) #2365 ](https://github.com/ROCm/TheRock/actions/runs/21852717253) [Release portable Linux PyTorch Wheels (gfx110X-all, dev, 7.12.0.dev0+ab23b96387a5c79111438ea936764dc353773834) #2367 ](https://github.com/ROCm/TheRock/actions/runs/21853829698) [Release portable Linux PyTorch Wheels (gfx94X-dcgpu, dev, 7.12.0.dev0+ab23b96387a5c79111438ea936764dc353773834) #2368 ](https://github.com/ROCm/TheRock/actions/runs/21854299495) See example `testConsoleScripts (rocm_sdk.tests.core_test.ROCmCoreTest.testConsoleScripts) ... ok` https://github.com/ROCm/TheRock/actions/runs/21854299495/job/63070085745#step:11:1 ### B: Multi-Arch Tests Run `Multi-Arch CI`from this PR's [branch ](https://github.com/ROCm/TheRock/tree/users/erman-gurses/move-amd-smi): https://github.com/ROCm/TheRock/actions/runs/21834134481. The issue seen in here #3292 has gone. ## Local: ``` ((.venv) ) TheRock$ .venv/bin/python -m pip install --index-url=https://rocm.devreleases.amd.com/v2-staging/gfx94X-dcgpu torch==2.7.1+devrocm7.12.0.dev0.ab23b96387a5c79111438ea936764dc353773834 Looking in indexes: https://rocm.devreleases.amd.com/v2-staging/gfx94X-dcgpu Collecting torch==2.7.1+devrocm7.12.0.dev0.ab23b96387a5c79111438ea936764dc353773834 Downloading https://rocm.devreleases.amd.com/v2-staging/gfx94X-dcgpu/torch-2.7.1%2Bdevrocm7.12.0.dev0.ab23b96387a5c79111438ea936764dc353773834-cp312-cp312-linux_x86_64.whl (721.0 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 721.0/721.0 MB 48.1 MB/s eta 0:00:00 Collecting filelock (from torch==2.7.1+devrocm7.12.0.dev0.ab23b96387a5c79111438ea936764dc353773834) Using cached https://rocm.devreleases.amd.com/v2-staging/gfx94X-dcgpu/filelock-3.20.3-py3-none-any.whl (16 kB) Collecting typing-extensions>=4.10.0 (from torch==2.7.1+devrocm7.12.0.dev0.ab23b96387a5c79111438ea936764dc353773834) Using cached https://rocm.devreleases.amd.com/v2-staging/gfx94X-dcgpu/typing_extensions-4.15.0-py3-none-any.whl (44 kB) Collecting setuptools (from torch==2.7.1+devrocm7.12.0.dev0.ab23b96387a5c79111438ea936764dc353773834) Using cached https://rocm.devreleases.amd.com/v2-staging/gfx94X-dcgpu/setuptools-80.9.0-py3-none-any.whl (1.2 MB) Collecting sympy>=1.13.3 (from torch==2.7.1+devrocm7.12.0.dev0.ab23b96387a5c79111438ea936764dc353773834) Using cached https://rocm.devreleases.amd.com/v2-staging/gfx94X-dcgpu/sympy-1.14.0-py3-none-any.whl (6.3 MB) Collecting networkx (from torch==2.7.1+devrocm7.12.0.dev0.ab23b96387a5c79111438ea936764dc353773834) Using cached https://rocm.devreleases.amd.com/v2-staging/gfx94X-dcgpu/networkx-3.6.1-py3-none-any.whl (2.1 MB) Collecting jinja2 (from torch==2.7.1+devrocm7.12.0.dev0.ab23b96387a5c79111438ea936764dc353773834) Using cached https://rocm.devreleases.amd.com/v2-staging/gfx94X-dcgpu/jinja2-3.1.6-py3-none-any.whl (134 kB) Collecting fsspec (from torch==2.7.1+devrocm7.12.0.dev0.ab23b96387a5c79111438ea936764dc353773834) Using cached https://rocm.devreleases.amd.com/v2-staging/gfx94X-dcgpu/fsspec-2026.1.0-py3-none-any.whl (201 kB) Collecting rocm==7.12.0.dev0+ab23b96387a5c79111438ea936764dc353773834 (from rocm[libraries]==7.12.0.dev0+ab23b96387a5c79111438ea936764dc353773834->torch==2.7.1+devrocm7.12.0.dev0.ab23b96387a5c79111438ea936764dc353773834) Downloading https://rocm.devreleases.amd.com/v2-staging/gfx94X-dcgpu/rocm-7.12.0.dev0%2Bab23b96387a5c79111438ea936764dc353773834.tar.gz (16 kB) Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Collecting triton==3.3.1+devrocm7.12.0.dev0.ab23b96387a5c79111438ea936764dc353773834 (from torch==2.7.1+devrocm7.12.0.dev0.ab23b96387a5c79111438ea936764dc353773834) Downloading https://rocm.devreleases.amd.com/v2-staging/gfx94X-dcgpu/triton-3.3.1%2Bdevrocm7.12.0.dev0.ab23b96387a5c79111438ea936764dc353773834-cp312-cp312-linux_x86_64.whl (265.3 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 265.3/265.3 MB 54.0 MB/s eta 0:00:00 Collecting rocm-sdk-core==7.12.0.dev0+ab23b96387a5c79111438ea936764dc353773834 (from rocm==7.12.0.dev0+ab23b96387a5c79111438ea936764dc353773834->rocm[libraries]==7.12.0.dev0+ab23b96387a5c79111438ea936764dc353773834->torch==2.7.1+devrocm7.12.0.dev0.ab23b96387a5c79111438ea936764dc353773834) Downloading https://rocm.devreleases.amd.com/v2-staging/gfx94X-dcgpu/rocm_sdk_core-7.12.0.dev0%2Bab23b96387a5c79111438ea936764dc353773834-py3-none-linux_x86_64.whl (284.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 284.1/284.1 MB 61.3 MB/s eta 0:00:00 Collecting rocm-sdk-libraries-gfx94X-dcgpu==7.12.0.dev0+ab23b96387a5c79111438ea936764dc353773834 (from rocm[libraries]==7.12.0.dev0+ab23b96387a5c79111438ea936764dc353773834->torch==2.7.1+devrocm7.12.0.dev0.ab23b96387a5c79111438ea936764dc353773834) Downloading https://rocm.devreleases.amd.com/v2-staging/gfx94X-dcgpu/rocm_sdk_libraries_gfx94x_dcgpu-7.12.0.dev0%2Bab23b96387a5c79111438ea936764dc353773834-py3-none-linux_x86_64.whl (1588.7 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 GB 42.7 MB/s eta 0:00:00 Collecting mpmath<1.4,>=1.1.0 (from sympy>=1.13.3->torch==2.7.1+devrocm7.12.0.dev0.ab23b96387a5c79111438ea936764dc353773834) Using cached https://rocm.devreleases.amd.com/v2-staging/gfx94X-dcgpu/mpmath-1.3.0-py3-none-any.whl (536 kB) Collecting MarkupSafe>=2.0 (from jinja2->torch==2.7.1+devrocm7.12.0.dev0.ab23b96387a5c79111438ea936764dc353773834) Using cached https://rocm.devreleases.amd.com/v2-staging/gfx94X-dcgpu/markupsafe-3.0.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl (22 kB) Building wheels for collected packages: rocm Building wheel for rocm (pyproject.toml) ... done Created wheel for rocm: filename=rocm-7.12.0.dev0+ab23b96387a5c79111438ea936764dc353773834-py3-none-any.whl size=20754 sha256=f02d33b63c925fc71cf8b561aaba4478700e388f40c7964a55d7545d5b8a4770 Stored in directory: /home/nod/.cache/pip/wheels/7b/98/96/86d1dd8c6a61cf40f084c7efb9a9267e5d61bfc0fea9f52e81 Successfully built rocm Installing collected packages: rocm-sdk-libraries-gfx94X-dcgpu, rocm-sdk-core, mpmath, typing-extensions, sympy, setuptools, networkx, MarkupSafe, fsspec, filelock, triton, jinja2, rocm, torch Successfully installed MarkupSafe-3.0.3 filelock-3.20.3 fsspec-2026.1.0 jinja2-3.1.6 mpmath-1.3.0 networkx-3.6.1 rocm-7.12.0.dev0+ab23b96387a5c79111438ea936764dc353773834 rocm-sdk-core-7.12.0.dev0+ab23b96387a5c79111438ea936764dc353773834 rocm-sdk-libraries-gfx94X-dcgpu-7.12.0.dev0+ab23b96387a5c79111438ea936764dc353773834 setuptools-80.9.0 sympy-1.14.0 torch-2.7.1+devrocm7.12.0.dev0.ab23b96387a5c79111438ea936764dc353773834 triton-3.3.1+devrocm7.12.0.dev0.ab23b96387a5c79111438ea936764dc353773834 typing-extensions-4.15.0 ``` ``` ((.venv) ) TheRock$ rocm-sdk test `testCLI (rocm_sdk.tests.base_test.ROCmBaseTest.testCLI) ... ++ Exec [/home/nod/ergurses/TheRock]$ /home/nod/ergurses/TheRock/.venv/bin/python -P -m rocm_sdk --help ok testTargets (rocm_sdk.tests.base_test.ROCmBaseTest.testTargets) ... ++ Exec [/home/nod/ergurses/TheRock]$ /home/nod/ergurses/TheRock/.venv/bin/python -P -m rocm_sdk targets ok testVersion (rocm_sdk.tests.base_test.ROCmBaseTest.testVersion) ... ++ Exec [/home/nod/ergurses/TheRock]$ /home/nod/ergurses/TheRock/.venv/bin/python -P -m rocm_sdk version ok test_initialize_process_check_version (rocm_sdk.tests.base_test.ROCmBaseTest.test_initialize_process_check_version) ... ok test_initialize_process_check_version_asterisk (rocm_sdk.tests.base_test.ROCmBaseTest.test_initialize_process_check_version_asterisk) ... ok test_initialize_process_check_version_mismatch (rocm_sdk.tests.base_test.ROCmBaseTest.test_initialize_process_check_version_mismatch) ... ok test_initialize_process_check_version_mismatch_warning (rocm_sdk.tests.base_test.ROCmBaseTest.test_initialize_process_check_version_mismatch_warning) ... ok test_initialize_process_check_version_pattern (rocm_sdk.tests.base_test.ROCmBaseTest.test_initialize_process_check_version_pattern) ... ok test_initialize_process_env_preload_1 (rocm_sdk.tests.base_test.ROCmBaseTest.test_initialize_process_env_preload_1) ... ok test_initialize_process_env_preload_2_comma (rocm_sdk.tests.base_test.ROCmBaseTest.test_initialize_process_env_preload_2_comma) ... ok test_initialize_process_env_preload_2_semi (rocm_sdk.tests.base_test.ROCmBaseTest.test_initialize_process_env_preload_2_semi) ... ok test_initialize_process_preload_libraries (rocm_sdk.tests.base_test.ROCmBaseTest.test_initialize_process_preload_libraries) ... ok testConsoleScripts (rocm_sdk.tests.core_test.ROCmCoreTest.testConsoleScripts) ... ok testInstallationLayout (rocm_sdk.tests.core_test.ROCmCoreTest.testInstallationLayout) The `rocm_sdk` and core module must be siblings on disk. ... ok testPreloadLibraries (rocm_sdk.tests.core_test.ROCmCoreTest.testPreloadLibraries) ... ok testSharedLibrariesLoad (rocm_sdk.tests.core_test.ROCmCoreTest.testSharedLibrariesLoad) ... ok testConsoleScripts (rocm_sdk.tests.libraries_test.ROCmLibrariesTest.testConsoleScripts) ... ok testInstallationLayout (rocm_sdk.tests.libraries_test.ROCmLibrariesTest.testInstallationLayout) The `rocm_sdk` and libraries module must be siblings on disk. ... ok testSharedLibrariesLoad (rocm_sdk.tests.libraries_test.ROCmLibrariesTest.testSharedLibrariesLoad) ... ok ---------------------------------------------------------------------- Ran 19 tests in 6.455s OK ``` ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
## Motivation build ocltst for OpenCL <!-- Explain the purpose of this PR and the goals it aims to achieve. --> ## Technical Details add BUILD_TEST flag to ocl-clr to build ocltst <!-- Explain the changes along with any relevant GitHub links. --> ## Test Plan The new files for ocltst are: liboclperf.so liboclruntime.so oclperf.exclude oclruntime.exclude ocltst Dev build: https://github.com/ROCm/TheRock/actions/runs/21651603039 <!-- Explain any relevant testing done to verify this PR. --> ## Test Result The expected new files are included in the artifacts. Dev build passed. <!-- Briefly summarize test outcomes. --> ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --------- Co-authored-by: Joseph Macaranas <145489236+jayhawk-commits@users.noreply.github.com>
## Motivation Enable rocrtst to use the bundled hwloc library added in #3020. necessary for #2498 also resolves: #3316 ## Technical Details Changes: - Add `sysdeps-hwloc` to rocrtst's artifact dependencies in BUILD_TOPOLOGY.toml - Add `${THEROCK_BUNDLED_HWLOC}` to rocrtst's RUNTIME_DEPS - Add INTERFACE_LINK_DIRS and INTERFACE_INSTALL_RPATH_DIRS to ensure proper linking Depends on PR #3020 which adds hwloc infrastructure. ## Test Plan Verify that rocrtst builds and links correctly with the bundled hwloc library. ## Test Result Build artifact from this build: On develop from theRock: https://therock-ci-artifacts.s3.amazonaws.com/21842152546-linux/rocrtst_test_generic.tar.xz core\rocrtst\stage\lib\rocrtst\lib: > libhwloc.so.5 > LICENSE https://therock-ci-artifacts.s3.amazonaws.com/21842152546-linux/rocrtst_lib_generic.tar.xz core\rocrtst\stage\lib\rocrtst\lib: > libhwloc.so.5 ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
|
This PR is totally messed up since something strange happened on the main branch while I was working and the synchronize definitely didn't work. Closing it and just copying the appropriate files over to a new PR: #3382 |
Motivation
We currently only have a narrow build (small collection necessary for building miopen) of CK running on TheRock. We want a full build and testing here.
Technical Details
Changes were made to various CMake files, build topology, etc.
Test Plan
Testing done locally and waiting on CI.
Test Result
Waiting on CI.
Submission Checklist