Skip to content

v25.11.00

Latest

Choose a tag to compare

@manopapad manopapad released this 27 Nov 06:25
· 56 commits to main since this release
de7cf3f

This is a beta release of cuPyNumeric.

Pip wheels are available on PyPI at https://pypi.org/project/nvidia-cupynumeric/, for Linux (x86-64 and ARM64, with CUDA 12 and multi-node support) and macOS (for ARM64). Conda packages are available at https://anaconda.org/legate/cupynumeric, for Linux (x86-64 and ARM64, with CUDA 12/13 and multi-node support). GASNet-based (rather than UCX-based) conda packages are under the gex label. Windows is currently supported through WSL.

Documentation for this release can be found at https://docs.nvidia.com/cupynumeric/25.11/.

Highlights

Support matrix changes

  • Start distributing conda packages for CUDA 13.
  • Port to cuSolverMp 0.7 (now the new required minimum).
  • Validate cuPyNumeric on DGX Spark.

Note that currently the pip wheels do not include CUDA 13 support, nor cuSolverMp support (linear solve / matrix decomposition APIs are constrained to single-GPU execution when using the wheels).

Added functionality

  • cupynumeric.histogram2d and cupynumeric.histogramdd
  • cupynumeric.lexsort
  • cupynumeric.isin
  • Multi-GPU & multi-node implementation of QR factorization, based on cuSolverMp

Performance improvements

  • Accelerate axis-wise reductions on GPUs by combining multiple kernel invocations into one.
  • Parallelize specialized implementation for cupynumeric.take, and use it in more cases, including cupynumeric.take_along_axis.

UX improvements

  • I/O functions (e.g. hdf5 to_file) and memory offloading (e.g. offload_to) functions from Legate now accept cuPyNumeric ndarrays directly.

Known issues

  • We are aware of hangs when using cuSolverMp-based APIs on 4+ Perlmutter nodes. This appears to be a cluster-specific issue, that we are investigating.
  • We are aware of hangs when using UCX 1.19 with the CUDA 13 conda packages. These are typically accompanied by an error message like this:
    ib_md.c:287  UCX  ERROR ibv_reg_mr(address=(nil), length=134217728, access=0xf) failed: Bad address
    ucp_mm.c:76   UCX  ERROR failed to register address (nil) (cuda) length 134217728 on md[6]=mlx5_0: Input/output error (md supports: host|cuda)
    
    We are investigating a proper fix. For the time being, setting UCX_MEMTYPE_CACHE=no in the environment appears to resolve the hang, at the cost of potentially decreasing UCX performance.

Full Changelog: v25.10.00...v25.11.00