Dry Run Protocol by achirkin · Pull Request #4 · achirkin/raft

achirkin · 2026-03-05T15:02:21Z

This PR is to show an isolated set of changes for rapidsai#2961

…mory Introduce a dry-run execution framework that replaces device and host memory resources with lightweight fake allocators to measure peak memory usage without holding real memory. New files: - dry_run_memory_resource.hpp: dry_run_allocator (lock-free bump allocator), dry_run_device_memory_resource, dry_run_host_memory_resource, dry_run_resource_manager (RAII), and dry_run_execute() helper. - dry_run_flag.hpp: boolean dry-run flag as a raft resource, allowing algorithms to skip kernel execution during profiling. - tests/util/dry_run_memory_resource.cpp: unit tests. The dry_run_allocator probes the upstream once to obtain a base address, then atomically bumps a pointer for each allocation — no mutex, no map, no real memory held after the initial probe.

…pinned_memory_resource Add pinned and managed resources to the raft::resources handle to make it possible to customize / temporarily replace these resources

…aking change due to transitive includes in downstream libraries

Merges Remove deprecated headers (rapidsai#2939). Conflict resolutions: - rsvd.cuh: Use new mdspan-based raft::matrix::sqrt and reciprocal APIs (they have internal dry-run guards); kept cudaMemsetAsync guard - svd.cuh: Use raft::matrix::weighted_sqrt (has internal dry-run guard) - matrix.cuh: Accept deletion (deprecated, removed in main) Co-authored-by: Cursor <cursoragent@cursor.com>

Adapt the dry-run protocol to use the unified cuda::mr resource infrastructure from fea-unify-memory-resources. Key changes: - Replace dry_run_device_memory_resource (rmm subclass) and dry_run_host_memory_resource (std::pmr subclass) with a single dry_run_resource<Upstream> template using cuda::forward_property, modeled after raft::mr::statistics_adaptor. - Replace dry_run_resource_manager (which modified the passed-in resources handle) with dry_run_resources, a standalone class that copies the resources object and provides implicit conversion to const resources&, enabling composability with other resource wrappers. - dry_run_allocator uses probe-once semantics: a single real allocation from the upstream is kept alive for the allocator's lifetime, and all subsequent allocations return the same valid pointer. - Remove obsolete pmr/pinned_memory_resource.hpp (superseded by cuda::mr::legacy_pinned_memory_resource in the unified branch). - Adapt tests to use unified resource APIs (host_resource_ref, host_device_resource_ref, get_default_host_resource, etc.). Made-with: Cursor

…ep/restore the state of resources

This option allows generating dependencies without `libucx` in the dependencies list, which is something we have to do for NVAIE/DLFW builds. Authors: - Paul Taylor (https://github.com/trxcllnt) Approvers: - James Lamb (https://github.com/jameslamb) URL: rapidsai#2975

…apidsai#2974) The per-row offset `l_offset = (offset + batch_id) * len` was stored as `IdxT`, which silently overflows when the total matrix size exceeds the range of a 32-bit index type. Both `offset` and `batch_id` are already `size_t`, so the multiplication naturally produces a `size_t`, but truncating it back to `IdxT` caused incorrect pointer arithmetic in the kernels. Introduced a layout-policy abstraction (`dense_layout` / `csr_layout`) in a new header `select_k_layout.cuh`. This replaced the `len_or_indptr` boolean template parameter, to improve the API and push the related computations to compile-time for all select-k kernels. Authors: - Yan Zaretskiy (https://github.com/yan-zaretskiy) Approvers: - Artem M. Chirkin (https://github.com/achirkin) URL: rapidsai#2974

This reverts commit e9901c6.

This PR updates the repository to version 26.06. This is part of the 26.04 release burndown process.

Fixes these `pre-commit` errors ```text In file RAPIDS_BRANCH:1:9: release/26.04 warning: do not hard-code version, read from VERSION file instead In file RAPIDS_BRANCH:1:9: release/26.04 verify-hardcoded-version-ucxx............................................Failed - hook id: verify-hardcoded-version - exit code: 1 In file UCXX_BRANCH:1:9: release/0.49 warning: do not hard-code version, read from UCXX_VERSION file instead In file UCXX_BRANCH:1:9: release/0.49 ``` See rapidsai/pre-commit-hooks#121 for details Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: rapidsai#2980

Forward-merge release/26.04 into main

Use `cuda::mr::any_synchronous_resource` for host, pinned, and managed resource types and give the user explicit control for host, pinned, and managed resources. #### New - `raft::resource::managed_memory_resource` and `raft::resource::pinned_memory_resource` are passed to managed and pinned mdarrays during construction via corresponding container policies. This allows the user to replace/modify these resources, for example, to add logging or memory pooling. - `raft::mr::get_default_host_resource` and `raft::mr::set_default_host_resource` can be used by the user to alter the default host resource the same way. It is not stored in `raft::resources` handle like the other two for two reasons: 1. To mirror rmm default device resource getter/setter 2. To avoid breaking the `raft::make_host_mdarray` overloads that do not take `raft::resources` as an argument (many instances across raft and cuvs). #### Changed - Use `raft::mr::host_resource_ref` and `raft::mr::host_device_resource_ref` for the non-owning semantics (defined as `cuda::mr::synchronous_resource_ref` with appropriate access attributes) - Use `raft::host_resource` and `raft::host_device_resource` for owning semantics (defined as `cuda::mr::any_synchronous_resource` with appropriate access attributes) With these changes, raft fully switches to `cuda::mr` types for host and host-device resources, while still using `rmm` types for device async resources. Changing the latter would break a lot of cuVS and is not needed - `rmm` will eventually fully converge to `cuda::mr` anyway. #### Breaking changes - Rename container policies - Reuse of a single `host_container` for the three types of resources. - Switch to using `cuda::mr::any_synchronous_resource` from `std::pmr::memory_resource` The effect of this changes should be limited, because the policies are hidden behind the mdarray templates and synonyms and the `std::pmr::memory_resource` was introduced recently and haven't been used much. Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Bradley Dice (https://github.com/bdice) - Tamas Bela Feher (https://github.com/tfeher) URL: rapidsai#2968

achirkin and others added 28 commits February 18, 2026 10:46

First batch of dry-run guards

695a8a3

Dry run compliance for raft::linalg namespace

42d8ad4

Update developer guide with the dry run protocol

6db7ec8

BREAKING CHANGE: replaced pinned_container with host_container using …

d91a1c6

…pinned_memory_resource Add pinned and managed resources to the raft::resources handle to make it possible to customize / temporarily replace these resources

Dry run compliance for raft::matrix namespace

1a114f6

Dry run compliance for raft::random namespace

dec5e95

Dry run compliance for raft::solver namespace

f84d9a9

Dry run compliance for raft::sparse namespace

44793cd

Dry run compliance for raft::spectral namespace

d566fe9

Dry run compliance for raft::stats namespace

fc3bde6

Add a little bit more tests

b0ddbc8

Add the Dry Run Protocol Overview

15c07a1

Fix C++ example in the docs

1c57abb

Merge branch 'main' into fea-dry-run-protocol

d916b45

Add a few more tests and fix a missed CUDA call in QR algorithm

9d24480

Fix excess subsample doing work in dry run

7577e56

Add dry run compliance to the raft::copy on mdspans

99faf68

Merge branch 'main' into fea-dry-run-protocol

b859894

Revert changing includes from public to detail namespace to avoid bre…

57d4c19

…aking change due to transitive includes in downstream libraries

Merge branch 'main' into fea-dry-run-protocol

694ec63

Merge branch 'main' into fea-dry-run-protocol

8922b8f

Merge branch 'main' into fea-dry-run-protocol

3c17e3e

Merge branch 'main' into fea-dry-run-protocol

fb56025

Adapt to fea-unify-memory-resources

e76bf7c

Refactor dry_run_resources as a child of raft::resources to better ke…

2d3f8fc

…ep/restore the state of resources

achirkin changed the base branch from branch-0.20 to fea-unify-memory-resources March 5, 2026 15:02

achirkin mentioned this pull request Mar 5, 2026

Dry Run Protocol rapidsai/raft#2961

Open

trxcllnt and others added 10 commits March 6, 2026 14:57

Merge branch 'main' into fea-dry-run-protocol

d2cf85e

Prepare release/26.04

e9901c6

Revert "Prepare release/26.04"

2318df3

This reverts commit e9901c6.

Update to 26.06 (rapidsai#2979)

8b02c7f

This PR updates the repository to version 26.06. This is part of the 26.04 release burndown process.

Merge pull request rapidsai#2981 from rapidsai/release/26.04

bd71750

Forward-merge release/26.04 into main

Merge branch 'main' into fea-dry-run-protocol

e86b56d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dry Run Protocol#4

Dry Run Protocol#4
achirkin wants to merge 38 commits intofea-unify-memory-resourcesfrom
fea-dry-run-protocol

achirkin commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

achirkin commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants