ensure 'torch' CUDA wheels are installed in CI, remove unused dependencies#5453
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
12373cc to
830af14
Compare
.github/workflows/pr.yaml
Outdated
| script: ci/test_python.sh | ||
| # Skip failing tests on RTX PRO 6000 (Blackwell). xref: https://github.com/rapidsai/cugraph/issues/5421 | ||
| matrix_filter: map(select(.GPU != "rtxpro6000")) | ||
| conda-python-tests-nightly: |
There was a problem hiding this comment.
TODO: remove all of these *-nightly jobs before merging.
Just doing this temporarily to ensure these changes won't break nightly tests.
|
/ok to test |
| pip install \ | ||
| --upgrade \ | ||
| "nvidia-nvjitlink>=${CUDA_MAJOR}.${CUDA_MINOR}" | ||
| fi |
There was a problem hiding this comment.
On the last CI run here, the CUDA 13.0.2 wheels tests (and ONLY those jobs) failed to load libcugraph.so.
ImportError while loading conftest '/__w/cugraph/cugraph/python/cugraph/cugraph/tests/conftest.py'.
python/cugraph/cugraph/tests/conftest.py:16: in <module>
from cugraph.testing.mg_utils import (
...
/pyenv/versions/3.12.13/lib/python3.12/site-packages/cugraph/structure/graph_implementation/simpleGraph.py:4: in <module>
from cugraph.structure import graph_primtypes_wrapper
E ImportError: libcugraph.so: cannot open shared object file: No such file or directory
https://github.com/rapidsai/cugraph/actions/runs/23058535556/job/66982942162?pr=5453#step:13:517
The errors were swallowed (to be fixed inhttps://github.com/rapidsai/build-planning/issues/119 eventually), but I strongly suspect it's the same issue we've been working on ... cuVS's JIT-LTO stuff compiled w/ libnvJitLink 13.1 requires libnvJitLink 13.1 at runtime, but in those jobs we're getting nvidia-nvjitlink 13.0.
Proposing hacking nvidia-nvjitlink into the environment like this just temporarily to get test coverage of torch in CI to where we want it. I'm planning to revert in a future PR (soon - within 26.04, #5457). Proposing something similar in cugraph-gnn CI too: rapidsai/cugraph-gnn#425 (comment)
gforsyth
left a comment
There was a problem hiding this comment.
This looks like a nice solution to this problem, @jameslamb ! Thanks for all the helpful context comments. I only flagged up one thing and it's relatively minor
ci/test_wheel_cugraph.sh
Outdated
| torch_downloaded=false | ||
| if \ | ||
| { [ "${CUDA_MAJOR}" -eq 12 ] && [ "${CUDA_MINOR}" -ge 9 ]; } \ | ||
| || { [ "${CUDA_MAJOR}" -eq 13 ] && [ "${CUDA_MINOR}" -le 0 ]; }; \ |
There was a problem hiding this comment.
Should this be -ge? Or if not, then perhaps -eq since <0 with a minor version feels strange
There was a problem hiding this comment.
Thank you for the thorough review! You're totally right, and @bdice just commented on the same thing in the rmm PR: rapidsai/rmm#2279 (comment)
I'll get that fixed up here.
There was a problem hiding this comment.
Switched them both to -eq in 9a5a35e
Could you take another look?
alexbarghi-nv
left a comment
There was a problem hiding this comment.
Looks good, thanks for removing the unnecessary dependencies as well. It's unfortunate that the nvjitlink issues are still there but hopefully we can get those fixed and resolved in #5457.
|
Thank you both! I'll queue this to merge. |
|
/merge |
Contributes to rapidsai/build-planning#257
torchis installed in wheels CI, it's always a CUDA variantogb,pydantic,torchdata, andtorchmetricsNotes for Reviewers
How does this help with testing against a mix of CTKs?
torchCUDA wheels tightly pin tonvidia-{thing}wheels and soon,cuda-toolkit(see rapidsai/build-planning#255).Forcing the use of CUDA
torchwheels ensures we'll catch dependency conflicts in CI. Without this,pipcould silently fall back to CPU-onlytorchfrom pypi.org.How I tested this
This PR uses some patterns I've tested elsewhere too:
I also tested a commit here with both the PR and nightly test matrices, to be sure we covered everything: https://github.com/rapidsai/cugraph/actions/runs/23062364778/job/66996357196?pr=5453