Skip to content

Dry Run Protocol#2961

Open
achirkin wants to merge 54 commits intorapidsai:mainfrom
achirkin:fea-dry-run-protocol
Open

Dry Run Protocol#2961
achirkin wants to merge 54 commits intorapidsai:mainfrom
achirkin:fea-dry-run-protocol

Conversation

@achirkin
Copy link
Contributor

@achirkin achirkin commented Feb 20, 2026

The dry run protocol defines a mechanism to simulate the execution of algorithms to get a precise estimate of the memory requirements for a real execution with the same parameters.

#include <raft/util/dry_run_memory_resource.hpp>

raft::resources res;
// auto my_function(const raft::resources& res, my_args...);
auto stats = raft::util::dry_run_execute(res, my_function, my_args...);
// stats.device_global_peak  – peak device memory (bytes)

This PR:

  • Introduces new infrastructure: raft::util::dry_run_execute, tracking memory resource, resource::get_dry_run_flag) that lets callers estimate peak memory usage of any RAFT algorithm without executing GPU work.
  • Makes all public functions across all raft namespaces dry-run compliant: allocations are always visible to the tracker; CUDA work is skipped.
  • Adds a small user guide (docs/source/dry_run_protocol.md)

Depends on (and includes all changes of) #2968

…mory

Introduce a dry-run execution framework that replaces device and host
memory resources with lightweight fake allocators to measure peak memory
usage without holding real memory.

New files:
- dry_run_memory_resource.hpp: dry_run_allocator (lock-free bump
  allocator), dry_run_device_memory_resource, dry_run_host_memory_resource,
  dry_run_resource_manager (RAII), and dry_run_execute() helper.
- dry_run_flag.hpp: boolean dry-run flag as a raft resource, allowing
  algorithms to skip kernel execution during profiling.
- tests/util/dry_run_memory_resource.cpp: unit tests.

The dry_run_allocator probes the upstream once to obtain a base address,
then atomically bumps a pointer for each allocation — no mutex, no map,
no real memory held after the initial probe.
…pinned_memory_resource

Add pinned and managed resources to the raft::resources handle to make it possible to customize / temporarily replace these resources
@achirkin achirkin self-assigned this Feb 20, 2026
@achirkin achirkin requested review from a team as code owners February 20, 2026 12:30
@achirkin achirkin added feature request New feature or request breaking Breaking change labels Feb 20, 2026
Merges Remove deprecated headers (rapidsai#2939). Conflict resolutions:
- rsvd.cuh: Use new mdspan-based raft::matrix::sqrt and reciprocal APIs
  (they have internal dry-run guards); kept cudaMemsetAsync guard
- svd.cuh: Use raft::matrix::weighted_sqrt (has internal dry-run guard)
- matrix.cuh: Accept deletion (deprecated, removed in main)

Co-authored-by: Cursor <cursoragent@cursor.com>
@achirkin achirkin added the DO NOT MERGE Hold off on merging; see PR for details label Mar 3, 2026
tfeher pushed a commit to Stardust-SJF/cuvs_rabitq that referenced this pull request Mar 3, 2026
A non-breaking src-only changes to modernize the use of raft primitives across cuVS source code. The general rule applied here is to prefer raft helpers taking `raft::resources` as an argument over other raft helpers over third-party libraries.

- thrust::fill / thrust::fill_n → raft::matrix::fill
- thrust::transform → raft::linalg::map
- thrust::sequence / thrust::tabulate → raft::linalg::map_offset
- raft::linalg::unaryOp / raft::linalg::binaryOp → raft::linalg::map
- raft::linalg::add (pointer-based) → raft::linalg::add (mdspan-based)
- raft::copy (pointer-based) → raft::copy (mdspan-based)
- raft::update_device / raft::update_host → raft::copy (mdspan-based)
- raft::linalg::rowNorm → raft::linalg::norm
- raft::linalg::reduce (pointer-based) → raft::linalg::reduce (mdspan-based)
- cudaMemsetAsync → raft::matrix::fill

The purpose of this PR is to improve the consistency in using the library code (even though sometimes at the cost of a bit more auxiliary code).
This is also a prerequisite to achieving dry run compliance in cuVS if we choose to merge that in rapidsai/raft#2961

Authors:
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#1837
achirkin and others added 8 commits March 4, 2026 06:14
Adapt the dry-run protocol to use the unified cuda::mr resource
infrastructure from fea-unify-memory-resources.

Key changes:
- Replace dry_run_device_memory_resource (rmm subclass) and
  dry_run_host_memory_resource (std::pmr subclass) with a single
  dry_run_resource<Upstream> template using cuda::forward_property,
  modeled after raft::mr::statistics_adaptor.
- Replace dry_run_resource_manager (which modified the passed-in
  resources handle) with dry_run_resources, a standalone class that
  copies the resources object and provides implicit conversion to
  const resources&, enabling composability with other resource wrappers.
- dry_run_allocator uses probe-once semantics: a single real allocation
  from the upstream is kept alive for the allocator's lifetime, and all
  subsequent allocations return the same valid pointer.
- Remove obsolete pmr/pinned_memory_resource.hpp (superseded by
  cuda::mr::legacy_pinned_memory_resource in the unified branch).
- Adapt tests to use unified resource APIs (host_resource_ref,
  host_device_resource_ref, get_default_host_resource, etc.).

Made-with: Cursor
@achirkin achirkin added DO NOT MERGE Hold off on merging; see PR for details and removed DO NOT MERGE Hold off on merging; see PR for details labels Mar 5, 2026
@achirkin achirkin added non-breaking Non-breaking change and removed breaking Breaking change labels Mar 5, 2026
@achirkin
Copy link
Contributor Author

achirkin commented Mar 5, 2026

The smaller diff without #2968 is viewable here: achirkin#4

@achirkin achirkin removed the DO NOT MERGE Hold off on merging; see PR for details label Mar 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Non-breaking change

Projects

Development

Successfully merging this pull request may close these issues.

1 participant