Dry Run Protocol by achirkin · Pull Request #2961 · rapidsai/raft

achirkin · 2026-02-20T12:30:42Z

The dry run protocol defines a mechanism to simulate the execution of algorithms to get a precise estimate of the memory requirements for a real execution with the same parameters.

#include <raft/util/dry_run_memory_resource.hpp>

raft::resources res;
// auto my_function(const raft::resources& res, my_args...);
auto stats = raft::util::dry_run_execute(res, my_function, my_args...);
// stats.device_global_peak  – peak device memory (bytes)

This PR:

Introduces new infrastructure: raft::util::dry_run_execute, tracking memory resource, resource::get_dry_run_flag) that lets callers estimate peak memory usage of any RAFT algorithm without executing GPU work.
Makes all public functions across all raft namespaces dry-run compliant: allocations are always visible to the tracker; CUDA work is skipped.
Adds a small user guide (docs/source/dry_run_protocol.md)

Depends on (and includes all changes of) #2968

…mory Introduce a dry-run execution framework that replaces device and host memory resources with lightweight fake allocators to measure peak memory usage without holding real memory. New files: - dry_run_memory_resource.hpp: dry_run_allocator (lock-free bump allocator), dry_run_device_memory_resource, dry_run_host_memory_resource, dry_run_resource_manager (RAII), and dry_run_execute() helper. - dry_run_flag.hpp: boolean dry-run flag as a raft resource, allowing algorithms to skip kernel execution during profiling. - tests/util/dry_run_memory_resource.cpp: unit tests. The dry_run_allocator probes the upstream once to obtain a base address, then atomically bumps a pointer for each allocation — no mutex, no map, no real memory held after the initial probe.

…pinned_memory_resource Add pinned and managed resources to the raft::resources handle to make it possible to customize / temporarily replace these resources

…aking change due to transitive includes in downstream libraries

Merges Remove deprecated headers (rapidsai#2939). Conflict resolutions: - rsvd.cuh: Use new mdspan-based raft::matrix::sqrt and reciprocal APIs (they have internal dry-run guards); kept cudaMemsetAsync guard - svd.cuh: Use raft::matrix::weighted_sqrt (has internal dry-run guard) - matrix.cuh: Accept deletion (deprecated, removed in main) Co-authored-by: Cursor <cursoragent@cursor.com>

A non-breaking src-only changes to modernize the use of raft primitives across cuVS source code. The general rule applied here is to prefer raft helpers taking `raft::resources` as an argument over other raft helpers over third-party libraries. - thrust::fill / thrust::fill_n → raft::matrix::fill - thrust::transform → raft::linalg::map - thrust::sequence / thrust::tabulate → raft::linalg::map_offset - raft::linalg::unaryOp / raft::linalg::binaryOp → raft::linalg::map - raft::linalg::add (pointer-based) → raft::linalg::add (mdspan-based) - raft::copy (pointer-based) → raft::copy (mdspan-based) - raft::update_device / raft::update_host → raft::copy (mdspan-based) - raft::linalg::rowNorm → raft::linalg::norm - raft::linalg::reduce (pointer-based) → raft::linalg::reduce (mdspan-based) - cudaMemsetAsync → raft::matrix::fill The purpose of this PR is to improve the consistency in using the library code (even though sometimes at the cost of a bit more auxiliary code). This is also a prerequisite to achieving dry run compliance in cuVS if we choose to merge that in rapidsai/raft#2961 Authors: - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#1837

…aw CCCL references

…urrent_device_resource()

…implementations

Adapt the dry-run protocol to use the unified cuda::mr resource infrastructure from fea-unify-memory-resources. Key changes: - Replace dry_run_device_memory_resource (rmm subclass) and dry_run_host_memory_resource (std::pmr subclass) with a single dry_run_resource<Upstream> template using cuda::forward_property, modeled after raft::mr::statistics_adaptor. - Replace dry_run_resource_manager (which modified the passed-in resources handle) with dry_run_resources, a standalone class that copies the resources object and provides implicit conversion to const resources&, enabling composability with other resource wrappers. - dry_run_allocator uses probe-once semantics: a single real allocation from the upstream is kept alive for the allocator's lifetime, and all subsequent allocations return the same valid pointer. - Remove obsolete pmr/pinned_memory_resource.hpp (superseded by cuda::mr::legacy_pinned_memory_resource in the unified branch). - Adapt tests to use unified resource APIs (host_resource_ref, host_device_resource_ref, get_default_host_resource, etc.). Made-with: Cursor

…ep/restore the state of resources

achirkin · 2026-03-05T15:08:02Z

The smaller diff without #2968 is viewable here: achirkin#4

achirkin added 13 commits February 18, 2026 10:46

First batch of dry-run guards

695a8a3

Dry run compliance for raft::linalg namespace

42d8ad4

Update developer guide with the dry run protocol

6db7ec8

BREAKING CHANGE: replaced pinned_container with host_container using …

d91a1c6

…pinned_memory_resource Add pinned and managed resources to the raft::resources handle to make it possible to customize / temporarily replace these resources

Dry run compliance for raft::matrix namespace

1a114f6

Dry run compliance for raft::random namespace

dec5e95

Dry run compliance for raft::solver namespace

f84d9a9

Dry run compliance for raft::sparse namespace

44793cd

Dry run compliance for raft::spectral namespace

d566fe9

Dry run compliance for raft::stats namespace

fc3bde6

Add a little bit more tests

b0ddbc8

Add the Dry Run Protocol Overview

15c07a1

achirkin self-assigned this Feb 20, 2026

achirkin requested review from a team as code owners February 20, 2026 12:30

achirkin added feature request New feature or request breaking Breaking change labels Feb 20, 2026

github-project-automation bot added this to Vector Search, ML, & Data Mining Release Board Feb 20, 2026

achirkin and others added 3 commits February 20, 2026 13:31

Fix C++ example in the docs

1c57abb

Merge branch 'main' into fea-dry-run-protocol

d916b45

Add a few more tests and fix a missed CUDA call in QR algorithm

9d24480

achirkin moved this to In Progress in Vector Search, ML, & Data Mining Release Board Feb 20, 2026

achirkin and others added 5 commits February 20, 2026 15:44

Fix excess subsample doing work in dry run

7577e56

Add dry run compliance to the raft::copy on mdspans

99faf68

Merge branch 'main' into fea-dry-run-protocol

b859894

Revert changing includes from public to detail namespace to avoid bre…

57d4c19

…aking change due to transitive includes in downstream libraries

Merge branch 'main' into fea-dry-run-protocol

694ec63

achirkin mentioned this pull request Feb 23, 2026

Modernize the uses of raft in cuVS rapidsai/cuvs#1837

Merged

achirkin and others added 9 commits February 27, 2026 17:44

Merge branch 'main' into fea-dry-run-protocol

8922b8f

C++17 backwards-compatibility

866211e

Merge branch 'main' into fea-unify-memory-resources

c171d84

newline

268eb1b

Add raft::mr::device_resource wrapper for cuda::mr::any_resource

5c718d6

Copy semantics and return resource refs

c5ab9c4

Rework workspace resources to avoid nesting bridge layers

6af142e

Fix the argument order in tests

ece1990

Merge branch 'main' into fea-dry-run-protocol

3c17e3e

achirkin added the DO NOT MERGE Hold off on merging; see PR for details label Mar 3, 2026

achirkin added 2 commits March 3, 2026 10:36

Merge branch 'main' into fea-unify-memory-resources

4dd256b

Add explicit conversion through cuda::mr refs to rmm ref

a26357d

achirkin and others added 8 commits March 4, 2026 06:14

Switch from rmm host and host_device resource reference wrappers to r…

2a90680

…aw CCCL references

Merge branch 'main' into fea-unify-memory-resources

59c3793

Prefer rmm::mr::get_current_device_resource_ref() over rmm::mr::get_c…

3a40d22

…urrent_device_resource()

Remove raft pinned and managed memory resources in favor of cuda::mr …

cce4f45

…implementations

Merge branch 'main' into fea-dry-run-protocol

fb56025

Adapt to fea-unify-memory-resources

e76bf7c

Refactor dry_run_resources as a child of raft::resources to better ke…

2d3f8fc

…ep/restore the state of resources

achirkin added DO NOT MERGE Hold off on merging; see PR for details and removed DO NOT MERGE Hold off on merging; see PR for details labels Mar 5, 2026

achirkin mentioned this pull request Mar 5, 2026

Dry Run Protocol achirkin/raft#4

Open

achirkin added non-breaking Non-breaking change and removed breaking Breaking change labels Mar 5, 2026

achirkin removed the DO NOT MERGE Hold off on merging; see PR for details label Mar 9, 2026

achirkin added 2 commits March 9, 2026 12:47

Merge branch 'main' into fea-dry-run-protocol

d2cf85e

Merge branch 'main' into fea-dry-run-protocol

e86b56d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dry Run Protocol#2961

Dry Run Protocol#2961
achirkin wants to merge 54 commits intorapidsai:mainfrom
achirkin:fea-dry-run-protocol

achirkin commented Feb 20, 2026 •

edited

Loading

Uh oh!

achirkin commented Mar 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

achirkin commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

achirkin commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

achirkin commented Feb 20, 2026 •

edited

Loading

achirkin commented Mar 5, 2026 •

edited

Loading