Skip to content

ensure 'torch' CUDA wheels are installed in CI, remove unused dependencies#5453

Merged
rapids-bot[bot] merged 18 commits intorapidsai:release/26.04from
jameslamb:remove-torch-reqs
Mar 13, 2026
Merged

ensure 'torch' CUDA wheels are installed in CI, remove unused dependencies#5453
rapids-bot[bot] merged 18 commits intorapidsai:release/26.04from
jameslamb:remove-torch-reqs

Conversation

@jameslamb
Copy link
Member

@jameslamb jameslamb commented Mar 5, 2026

Contributes to rapidsai/build-planning#257

  • ensures that when torch is installed in wheels CI, it's always a CUDA variant
  • removes unused test dependencies ogb, pydantic, torchdata, and torchmetrics

Notes for Reviewers

How does this help with testing against a mix of CTKs?

torch CUDA wheels tightly pin to nvidia-{thing} wheels and soon, cuda-toolkit (see rapidsai/build-planning#255).

Forcing the use of CUDA torch wheels ensures we'll catch dependency conflicts in CI. Without this, pip could silently fall back to CPU-only torch from pypi.org.

How I tested this

This PR uses some patterns I've tested elsewhere too:

I also tested a commit here with both the PR and nightly test matrices, to be sure we covered everything: https://github.com/rapidsai/cugraph/actions/runs/23062364778/job/66996357196?pr=5453

@jameslamb jameslamb added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 5, 2026
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 5, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@jameslamb jameslamb changed the base branch from main to release/26.04 March 12, 2026 19:10
script: ci/test_python.sh
# Skip failing tests on RTX PRO 6000 (Blackwell). xref: https://github.com/rapidsai/cugraph/issues/5421
matrix_filter: map(select(.GPU != "rtxpro6000"))
conda-python-tests-nightly:
Copy link
Member Author

@jameslamb jameslamb Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: remove all of these *-nightly jobs before merging.

Just doing this temporarily to ensure these changes won't break nightly tests.

@jameslamb
Copy link
Member Author

/ok to test

pip install \
--upgrade \
"nvidia-nvjitlink>=${CUDA_MAJOR}.${CUDA_MINOR}"
fi
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the last CI run here, the CUDA 13.0.2 wheels tests (and ONLY those jobs) failed to load libcugraph.so.

ImportError while loading conftest '/__w/cugraph/cugraph/python/cugraph/cugraph/tests/conftest.py'.
python/cugraph/cugraph/tests/conftest.py:16: in <module>
    from cugraph.testing.mg_utils import (
...
/pyenv/versions/3.12.13/lib/python3.12/site-packages/cugraph/structure/graph_implementation/simpleGraph.py:4: in <module>
    from cugraph.structure import graph_primtypes_wrapper
E   ImportError: libcugraph.so: cannot open shared object file: No such file or directory

https://github.com/rapidsai/cugraph/actions/runs/23058535556/job/66982942162?pr=5453#step:13:517

The errors were swallowed (to be fixed inhttps://github.com/rapidsai/build-planning/issues/119 eventually), but I strongly suspect it's the same issue we've been working on ... cuVS's JIT-LTO stuff compiled w/ libnvJitLink 13.1 requires libnvJitLink 13.1 at runtime, but in those jobs we're getting nvidia-nvjitlink 13.0.

Proposing hacking nvidia-nvjitlink into the environment like this just temporarily to get test coverage of torch in CI to where we want it. I'm planning to revert in a future PR (soon - within 26.04, #5457). Proposing something similar in cugraph-gnn CI too: rapidsai/cugraph-gnn#425 (comment)

@jameslamb jameslamb changed the title WIP: ensure 'torch' CUDA wheels are installed in CI, remove unused dependencies ensure 'torch' CUDA wheels are installed in CI, remove unused dependencies Mar 13, 2026
@jameslamb jameslamb marked this pull request as ready for review March 13, 2026 17:39
@jameslamb jameslamb requested a review from a team as a code owner March 13, 2026 17:39
@jameslamb jameslamb requested a review from a team as a code owner March 13, 2026 17:39
@jameslamb jameslamb requested a review from AyodeAwe March 13, 2026 17:39
Copy link
Contributor

@gforsyth gforsyth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a nice solution to this problem, @jameslamb ! Thanks for all the helpful context comments. I only flagged up one thing and it's relatively minor

torch_downloaded=false
if \
{ [ "${CUDA_MAJOR}" -eq 12 ] && [ "${CUDA_MINOR}" -ge 9 ]; } \
|| { [ "${CUDA_MAJOR}" -eq 13 ] && [ "${CUDA_MINOR}" -le 0 ]; }; \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be -ge? Or if not, then perhaps -eq since <0 with a minor version feels strange

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the thorough review! You're totally right, and @bdice just commented on the same thing in the rmm PR: rapidsai/rmm#2279 (comment)

I'll get that fixed up here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched them both to -eq in 9a5a35e

Could you take another look?

Copy link
Member

@alexbarghi-nv alexbarghi-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for removing the unnecessary dependencies as well. It's unfortunate that the nvjitlink issues are still there but hopefully we can get those fixed and resolved in #5457.

@jameslamb jameslamb requested a review from gforsyth March 13, 2026 20:00
Copy link
Contributor

@gforsyth gforsyth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@jameslamb jameslamb removed the request for review from AyodeAwe March 13, 2026 21:14
@jameslamb
Copy link
Member Author

Thank you both! I'll queue this to merge.

@jameslamb
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit 61c2b73 into rapidsai:release/26.04 Mar 13, 2026
76 checks passed
@jameslamb jameslamb deleted the remove-torch-reqs branch March 13, 2026 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants