Skip to content

Thread individual GPU targets through test fetch pipeline#3452

Open
stellaraccident wants to merge 1 commit intomainfrom
users/stellaraccident/fix-test-fetch-split-artifacts
Open

Thread individual GPU targets through test fetch pipeline#3452
stellaraccident wants to merge 1 commit intomainfrom
users/stellaraccident/fix-test-fetch-split-artifacts

Conversation

@stellaraccident
Copy link
Collaborator

Summary

Fixes #3444 (sub-issue of #3336)

When THEROCK_KPACK_SPLIT_ARTIFACTS=ON, the build produces per-target archives (e.g., blas_lib_gfx942.tar.zst) but the test fetch logic only looked for family-named archives (e.g., blas_lib_gfx94X-dcgpu.tar.zst), causing test jobs to get only generic (host-only) binaries with no device code.

  • Add fetch-gfx-targets list field to amdgpu_family_matrix.py mapping each test runner to the GPU architecture(s) on it
  • Thread amdgpu_targets through the workflow chain: configure_ci.pymulti_arch_ci_linux.ymltest_artifacts.ymltest_{sanity_check,component}.ymlsetup_test_environmentinstall_rocm_from_artifacts.pyfetch_artifacts.py
  • Rewrite list_artifacts_for_group() to use ArtifactName.from_filename() for structured parsing with inclusive matching (accepts both old family-named and new target-named archives)
  • Add --amdgpu-targets to artifact_manager.py fetch subcommand

Test plan

  • 51 unit tests pass (configure_ci, fetch_artifacts, artifact_manager)
  • Dry-run validated against multi-arch run 21854651990 (finds split archives like blas_lib_gfx942.tar.zst)
  • Dry-run validated against mono-arch run 22080712092 (backwards compatible with family-named archives)
  • CI validation on multi_arch/integration-kpack branch (pushed, running)

🤖 Generated with Claude Code

When THEROCK_KPACK_SPLIT_ARTIFACTS=ON, the build produces per-target
archives (e.g. blas_lib_gfx942.tar.zst) but the test fetch logic only
looked for family-named archives (e.g. blas_lib_gfx94X-dcgpu.tar.zst),
causing test jobs to get only generic (host-only) binaries with no
device code.

Add `fetch-gfx-targets` list field to amdgpu_family_matrix.py mapping
each test runner to the GPU architecture(s) available on it. Thread
this as `amdgpu_targets` through the workflow chain:

  configure_ci.py → multi_arch_ci_linux.yml → test_artifacts.yml →
  test_{sanity_check,component}.yml → setup_test_environment action →
  install_rocm_from_artifacts.py → fetch_artifacts.py

The fetch logic now uses inclusive ArtifactName-based matching: it
accepts both old family-named archives (mono-arch pipeline) and new
individual-target archives (split/kpack pipeline), so the same code
works against either bucket layout.

Changes:
- Add `fetch-gfx-targets` to all matrix entries in amdgpu_family_matrix.py
- Thread `amdgpu_targets` through configure_ci.py into CI matrix JSON
- Add `amdgpu_targets` input to test workflow YAML chain
- Accept `--amdgpu-targets` in fetch_artifacts.py, artifact_manager.py,
  and install_rocm_from_artifacts.py
- Rewrite list_artifacts_for_group() to use ArtifactName.from_filename()
  for structured parsing instead of substring matching
- Add `--amdgpu-targets` to artifact_manager.py fetch subcommand

Testing:
- 51 unit tests pass (configure_ci, fetch_artifacts, artifact_manager)
- Dry-run validated against real CI runs:
  - Multi-arch run 21854651990: correctly finds split archives like
    blas_lib_gfx942.tar.zst when --amdgpu-targets=gfx942
  - Mono-arch run 22080712092: correctly finds family-named archives
    like rocblas_lib_gfx94X-dcgpu.tar.xz (backwards compatible)
  - Both with and without --amdgpu-targets flag verified

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: TODO

Development

Successfully merging this pull request may close these issues.

[Multi-arch] Test artifact fetch doesn't find kpack split archives

1 participant