Skip to content

Dry Run Protocol#4

Open
achirkin wants to merge 38 commits intofea-unify-memory-resourcesfrom
fea-dry-run-protocol
Open

Dry Run Protocol#4
achirkin wants to merge 38 commits intofea-unify-memory-resourcesfrom
fea-dry-run-protocol

Conversation

@achirkin
Copy link
Owner

@achirkin achirkin commented Mar 5, 2026

This PR is to show an isolated set of changes for rapidsai#2961

achirkin and others added 28 commits February 18, 2026 10:46
…mory

Introduce a dry-run execution framework that replaces device and host
memory resources with lightweight fake allocators to measure peak memory
usage without holding real memory.

New files:
- dry_run_memory_resource.hpp: dry_run_allocator (lock-free bump
  allocator), dry_run_device_memory_resource, dry_run_host_memory_resource,
  dry_run_resource_manager (RAII), and dry_run_execute() helper.
- dry_run_flag.hpp: boolean dry-run flag as a raft resource, allowing
  algorithms to skip kernel execution during profiling.
- tests/util/dry_run_memory_resource.cpp: unit tests.

The dry_run_allocator probes the upstream once to obtain a base address,
then atomically bumps a pointer for each allocation — no mutex, no map,
no real memory held after the initial probe.
…pinned_memory_resource

Add pinned and managed resources to the raft::resources handle to make it possible to customize / temporarily replace these resources
…aking change due to transitive includes in downstream libraries
Merges Remove deprecated headers (rapidsai#2939). Conflict resolutions:
- rsvd.cuh: Use new mdspan-based raft::matrix::sqrt and reciprocal APIs
  (they have internal dry-run guards); kept cudaMemsetAsync guard
- svd.cuh: Use raft::matrix::weighted_sqrt (has internal dry-run guard)
- matrix.cuh: Accept deletion (deprecated, removed in main)

Co-authored-by: Cursor <cursoragent@cursor.com>
Adapt the dry-run protocol to use the unified cuda::mr resource
infrastructure from fea-unify-memory-resources.

Key changes:
- Replace dry_run_device_memory_resource (rmm subclass) and
  dry_run_host_memory_resource (std::pmr subclass) with a single
  dry_run_resource<Upstream> template using cuda::forward_property,
  modeled after raft::mr::statistics_adaptor.
- Replace dry_run_resource_manager (which modified the passed-in
  resources handle) with dry_run_resources, a standalone class that
  copies the resources object and provides implicit conversion to
  const resources&, enabling composability with other resource wrappers.
- dry_run_allocator uses probe-once semantics: a single real allocation
  from the upstream is kept alive for the allocator's lifetime, and all
  subsequent allocations return the same valid pointer.
- Remove obsolete pmr/pinned_memory_resource.hpp (superseded by
  cuda::mr::legacy_pinned_memory_resource in the unified branch).
- Adapt tests to use unified resource APIs (host_resource_ref,
  host_device_resource_ref, get_default_host_resource, etc.).

Made-with: Cursor
@achirkin achirkin changed the base branch from branch-0.20 to fea-unify-memory-resources March 5, 2026 15:02
trxcllnt and others added 10 commits March 6, 2026 14:57
This option allows generating dependencies without `libucx` in the dependencies list, which is something we have to do for NVAIE/DLFW builds.

Authors:
  - Paul Taylor (https://github.com/trxcllnt)

Approvers:
  - James Lamb (https://github.com/jameslamb)

URL: rapidsai#2975
…apidsai#2974)

The per-row offset `l_offset = (offset + batch_id) * len` was stored as `IdxT`, which silently overflows when the total matrix size exceeds the range of a 32-bit index type. Both `offset` and `batch_id` are already `size_t`, so the multiplication naturally produces a `size_t`, but truncating it back to `IdxT` caused incorrect pointer arithmetic in the kernels.

Introduced a layout-policy abstraction (`dense_layout` / `csr_layout`) in a new header `select_k_layout.cuh`. This replaced the `len_or_indptr` boolean template parameter, to improve the API and push the related computations to compile-time for all select-k kernels.

Authors:
  - Yan Zaretskiy (https://github.com/yan-zaretskiy)

Approvers:
  - Artem M. Chirkin (https://github.com/achirkin)

URL: rapidsai#2974
This PR updates the repository to version 26.06.

This is part of the 26.04 release burndown process.
Fixes these `pre-commit` errors

```text
In file RAPIDS_BRANCH:1:9:
 release/26.04
warning: do not hard-code version, read from VERSION file instead

In file RAPIDS_BRANCH:1:9:
 release/26.04

verify-hardcoded-version-ucxx............................................Failed
- hook id: verify-hardcoded-version
- exit code: 1

In file UCXX_BRANCH:1:9:
 release/0.49
warning: do not hard-code version, read from UCXX_VERSION file instead

In file UCXX_BRANCH:1:9:
 release/0.49
```

See rapidsai/pre-commit-hooks#121 for details

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

URL: rapidsai#2980
Forward-merge release/26.04 into main
Use `cuda::mr::any_synchronous_resource` for host, pinned, and managed resource types and give the user explicit control for host, pinned, and managed resources.

#### New
  - `raft::resource::managed_memory_resource` and `raft::resource::pinned_memory_resource` are passed to managed and pinned mdarrays during construction via corresponding container policies. This allows the user to replace/modify these resources, for example, to add logging or memory pooling.
  - `raft::mr::get_default_host_resource` and `raft::mr::set_default_host_resource` can be used by the user to alter the default host resource the same way. It is not stored in `raft::resources` handle like the other two for two reasons:
    1. To mirror rmm default device resource getter/setter
    2. To avoid breaking the `raft::make_host_mdarray` overloads that do not take `raft::resources` as an argument (many instances across raft and cuvs).

#### Changed

 - Use `raft::mr::host_resource_ref` and `raft::mr::host_device_resource_ref` for the non-owning semantics (defined as `cuda::mr::synchronous_resource_ref` with appropriate access attributes)
 - Use `raft::host_resource` and `raft::host_device_resource` for owning semantics (defined as `cuda::mr::any_synchronous_resource` with appropriate access attributes)

With these changes, raft fully switches to `cuda::mr` types for host and host-device resources, while still using `rmm` types for device async resources. Changing the latter would break a lot of cuVS and is not needed - `rmm` will eventually fully converge to `cuda::mr` anyway.

#### Breaking changes
  - Rename container policies
  - Reuse of a single `host_container` for the three types of resources.
  - Switch to using `cuda::mr::any_synchronous_resource` from `std::pmr::memory_resource`

The effect of this changes should be limited, because the policies are hidden behind the mdarray templates and synonyms and the  `std::pmr::memory_resource` was introduced recently and haven't been used much.

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Tamas Bela Feher (https://github.com/tfeher)

URL: rapidsai#2968
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants