Forward-merge release/26.04 into main by rapids-bot[bot] · Pull Request #2297 · rapidsai/rmm

rapids-bot · 2026-03-12T22:04:59Z

Forward-merge triggered by push to release/26.04 that creates a PR to keep main up-to-date. If this PR is unable to be immediately merged due to conflicts, it will remain open for the team to manually merge. See forward-merger docs for more info.

Fixes these `pre-commit` errors blocking CI: ```text verify-hardcoded-version.................................................Failed - hook id: verify-hardcoded-version - exit code: 1 In file RAPIDS_BRANCH:1:9: release/26.04 warning: do not hard-code version, read from VERSION file instead In file RAPIDS_BRANCH:1:9: release/26.04 In file cpp/examples/versions.cmake:8:21: set(RMM_TAG release/26.04) warning: do not hard-code version, read from VERSION file instead In file cpp/examples/versions.cmake:8:21: set(RMM_TAG release/26.04) ``` By updating `verify-hardcoded-version` configuration and by updating the C++ examples to read `RMM_TAG` from the `RAPIDS_BRANCH` file. See rapidsai/pre-commit-hooks#121 for details Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) URL: #2293

rapids-bot · 2026-03-12T22:05:01Z

FAILURE - Unable to forward-merge due to an error, manual merge is necessary. Do not use the Resolve conflicts option in this PR, follow these instructions https://docs.rapids.ai/maintainers/forward-merger/

IMPORTANT: When merging this PR, do not use the auto-merger (i.e. the /merge comment). Instead, an admin must manually merge by changing the merging strategy to Create a Merge Commit. Otherwise, history will be lost and the branches become incompatible.

Contributes to rapidsai/build-planning#256 Broken out from #2270 Proposes a stricter pattern for installing `torch` wheels, to prevent bugs of the form "accidentally used a CPU-only `torch` from pypi.org". This should help us to catch compatibility issues, improving release confidence. Other small changes: * splits torch wheel testing into "oldest" (PyTorch 2.9) and "latest" (PyTorch 2.10) * introduces a `require_gpu_pytorch` matrix filter so conda jobs can explicitly request `pytorch-gpu` (to similarly ensure solvers don't fall back to the GPU-only variant) * appends `rapids-generate-pip-constraint` output to file `PIP_CONSTRAINT` points - *(to reduce duplication and the risk of failing to apply constraints)* Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) URL: #2279

…adaptor (#2304) So that the tracking resource adaptor is thread safe, the modification of the tracked allocations should be sandwiched by an "acquire-release" pair upstream.allocate-upstream.deallocate. Previously this was not the case, the upstream allocation occurred before updating the tracked allocations, but the dellocation did not occur after. This could lead to a scenario in multi-threaded use where we get a logged error that a deallocated pointer was not tracked. To solve this, actually use the correct pattern. Moreover, ensure that we don't observe ABA issues by using try_emplace when tracking an allocation. - Closes #2303 Authors: - Lawrence Mitchell (https://github.com/wence-) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) URL: #2304

…E 754 -0.0 (#2302) ## Description `device_uvector::set_element_async` had a zero-value optimization that used `cudaMemsetAsync` when `value == value_type{0}`. For IEEE 754 floating-point types, `-0.0 == 0.0` is `true` per the standard, so `-0.0` was incorrectly routed through `cudaMemsetAsync(..., 0, ...)` which clears all bits — including the sign bit — normalizing `-0.0` to `+0.0`. This corrupts the in-memory representation of `-0.0` for any downstream library that creates scalars through RMM (`cudf::fixed_width_scalar::set_value` → `rmm::device_scalar::set_value_async` → `device_uvector::set_element_async`), causing observable behavioral divergence in spark-rapids (e.g., `cast(-0.0 as string)` returns `"0.0"` on GPU instead of `"-0.0"`). ### Fix Per the discussion in #2298, remove all `constexpr` special casing in `set_element_async` — both the `bool` `cudaMemsetAsync` path and the `is_fundamental_v` zero-detection path — and always use `cudaMemcpyAsync`. This preserves exact bit-level representations for all types, which is the correct contract for a memory management library that sits below cuDF, cuML, and cuGraph. `set_element_to_zero_async` is unchanged — its explicit "set to zero" semantics make `cudaMemsetAsync` the correct implementation. ### Testing Added `NegativeZeroTest.PreservesFloatNegativeZero` and `NegativeZeroTest.PreservesDoubleNegativeZero` regression tests that verify the sign bit of `-0.0f` / `-0.0` survives a round-trip through `set_element_async` → `element`. All 122 tests pass locally (CUDA 13.0, RTX 5880). Closes #2298 ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/rmm/blob/HEAD/CONTRIBUTING.md). - [x] New or existing tests cover these changes. - [x] The documentation is up to date with these changes. Made with [Cursor](https://cursor.com) --------- Signed-off-by: Allen Xu <allxu@nvidia.com>

## Description I found that the `ulimit` settings for CUDA 13.1 devcontainers were missing. This fixes it. ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/rmm/blob/HEAD/CONTRIBUTING.md). - [x] New or existing tests cover these changes. - [x] The documentation is up to date with these changes.

This PR sets an upper bound on the `numba-cuda` dependency to `<0.29.0` Authors: - https://github.com/brandon-b-miller Approvers: - Bradley Dice (https://github.com/bdice) URL: #2306

bdice · 2026-03-16T23:22:47Z

Closed by #2310.

rapids-bot bot requested a review from a team as a code owner March 12, 2026 22:04

rapids-bot bot requested a review from gforsyth March 12, 2026 22:05

github-project-automation bot added this to RMM Project Board Mar 12, 2026

rapids-bot bot requested a review from a team as a code owner March 13, 2026 21:46

rapids-bot bot requested a review from a team as a code owner March 16, 2026 16:11

rapids-bot bot requested review from davidwendt and wence- March 16, 2026 16:11

wjxiz1992 and others added 3 commits March 16, 2026 12:49

Cap numba-cuda upper bound at <0.29.0 (#2306)

d9034ff

This PR sets an upper bound on the `numba-cuda` dependency to `<0.29.0` Authors: - https://github.com/brandon-b-miller Approvers: - Bradley Dice (https://github.com/bdice) URL: #2306

bdice merged commit 7ddf10f into main Mar 16, 2026
31 of 32 checks passed

github-project-automation bot moved this to Done in RMM Project Board Mar 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Forward-merge release/26.04 into main#2297

Forward-merge release/26.04 into main#2297
bdice merged 6 commits intomainfrom
release/26.04

rapids-bot bot commented Mar 12, 2026

Uh oh!

rapids-bot bot commented Mar 12, 2026

Uh oh!

Uh oh!

bdice commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

rapids-bot bot commented Mar 12, 2026

Uh oh!

rapids-bot bot commented Mar 12, 2026

Uh oh!

Uh oh!

bdice commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants