Skip to content

add cuda 13 patch for cupy memadvise to host memory as used with SAM #1001

Merged
eordentlich merged 4 commits intoNVIDIA:mainfrom
eordentlich:eo_cuda13_fixes
Dec 24, 2025
Merged

add cuda 13 patch for cupy memadvise to host memory as used with SAM #1001
eordentlich merged 4 commits intoNVIDIA:mainfrom
eordentlich:eo_cuda13_fixes

Conversation

@eordentlich
Copy link
Collaborator

No description provided.

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 23, 2025

Greptile Summary

This PR implements a workaround for a CuPy 13.6 bug in the memAdvise API when using CUDA 13. The changes introduce a new helper function _memadvise_cpu in utils.py that detects the CUDA version and uses the appropriate API:

  • For CUDA < 13.0.0: Uses the original CuPy API (cp.cuda.runtime.memAdvise)
  • For CUDA >= 13.0.0: Uses low-level cuda-python bindings to work around the CuPy bug

The helper function is then used consistently across clustering.py, core.py, and umap.py to replace all direct calls to cp.cuda.runtime.memAdvise. This ensures that SAM (System Allocated Memory) continues to work correctly by pinning numpy arrays to host memory and preventing unwanted page migration to device during GPU concatenation operations.

Confidence Score: 4/5

  • This PR is safe to merge with minimal risk - it's a focused bugfix with appropriate version detection
  • The implementation is solid with proper version checking and fallback logic. The author has already addressed syntax issues through multiple iterations. The only minor concern is the lack of unit tests for the new _memadvise_cpu function, but this is a workaround for a specific CUDA 13 environment issue that may be difficult to test in CI
  • No files require special attention - the changes are straightforward refactoring to use a centralized helper function

Important Files Changed

Filename Overview
python/src/spark_rapids_ml/utils.py Adds _memadvise_cpu helper function to work around CuPy 13.x memAdvise API bug using cuda-python bindings
python/src/spark_rapids_ml/clustering.py Replaces direct cp.cuda.runtime.memAdvise call with _memadvise_cpu helper function for SAM memory management
python/src/spark_rapids_ml/core.py Replaces direct cp.cuda.runtime.memAdvise calls with _memadvise_cpu helper for dense and sparse arrays
python/src/spark_rapids_ml/umap.py Replaces direct cp.cuda.runtime.memAdvise calls with _memadvise_cpu helper for dense and sparse arrays

Sequence Diagram

sequenceDiagram
    participant Caller as Algorithm Code<br/>(clustering/core/umap)
    participant Helper as _memadvise_cpu
    participant Version as cuda.bindings.__version__
    participant CuPy as cp.cuda.runtime.memAdvise<br/>(CUDA < 13)
    participant CudaPython as cuda.bindings.runtime<br/>(CUDA >= 13)
    
    Note over Caller: SAM enabled with numpy array
    Caller->>Helper: _memadvise_cpu(data, nbytes)
    Helper->>Version: Check cuda.bindings.__version__
    
    alt CUDA < 13.0.0
        Version-->>Helper: version < 13.0.0
        Helper->>CuPy: memAdvise(data, nbytes, 3, -1)
        Note over CuPy: Uses CuPy API<br/>(original method)
        CuPy-->>Helper: Success
    else CUDA >= 13.0.0
        Version-->>Helper: version >= 13.0.0
        Note over Helper: Workaround for CuPy 13.6 bug
        Helper->>CudaPython: Import cudaMemLocation,<br/>cudaMemLocationType,<br/>cudaMemoryAdvise
        Helper->>Helper: Create cudaMemLocation<br/>with Host type
        Helper->>CudaPython: cudaMemAdvise(data, nbytes,<br/>SetPreferredLocation, mem_location)
        Note over CudaPython: Uses low-level<br/>cuda-python bindings
        CudaPython-->>Helper: Success
    end
    
    Helper-->>Caller: Memory pinned to CPU
    Note over Caller: Prevents page migration<br/>during GPU concatenation
Loading

@eordentlich
Copy link
Collaborator Author

build

rishic3
rishic3 previously approved these changes Dec 24, 2025
Copy link
Collaborator

@rishic3 rishic3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor rewording suggestion

def _memadvise_cpu(data: Any, nbytes: int) -> None:
"""
Advise data referenced by pointer to stay in cpu memory.
For use with SAM to prevent migration of host memory staged partial matrices to remain on cpu during
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For use with SAM to prevent migration of host memory staged partial matrices to remain on cpu during
For use with SAM to prevent partial matrices staged on host from migrating to device during

Signed-off-by: Erik Ordentlich <eordentlich@gmail.com>
@eordentlich
Copy link
Collaborator Author

build

@eordentlich eordentlich merged commit aac8af5 into NVIDIA:main Dec 24, 2025
5 checks passed
@eordentlich eordentlich deleted the eo_cuda13_fixes branch December 24, 2025 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants