Skip to content

Unify memory resources#2968

Merged
rapids-bot[bot] merged 28 commits intorapidsai:mainfrom
achirkin:fea-unify-memory-resources
Mar 14, 2026
Merged

Unify memory resources#2968
rapids-bot[bot] merged 28 commits intorapidsai:mainfrom
achirkin:fea-unify-memory-resources

Conversation

@achirkin
Copy link
Contributor

@achirkin achirkin commented Feb 27, 2026

Use cuda::mr::any_synchronous_resource for host, pinned, and managed resource types and give the user explicit control for host, pinned, and managed resources.

New

  • raft::resource::managed_memory_resource and raft::resource::pinned_memory_resource are passed to managed and pinned mdarrays during construction via corresponding container policies. This allows the user to replace/modify these resources, for example, to add logging or memory pooling.
  • raft::mr::get_default_host_resource and raft::mr::set_default_host_resource can be used by the user to alter the default host resource the same way. It is not stored in raft::resources handle like the other two for two reasons:
    1. To mirror rmm default device resource getter/setter
    2. To avoid breaking the raft::make_host_mdarray overloads that do not take raft::resources as an argument (many instances across raft and cuvs).

Changed

  • Use raft::mr::host_resource_ref and raft::mr::host_device_resource_ref for the non-owning semantics (defined as cuda::mr::synchronous_resource_ref with appropriate access attributes)
  • Use raft::host_resource and raft::host_device_resource for owning semantics (defined as cuda::mr::any_synchronous_resource with appropriate access attributes)

With these changes, raft fully switches to cuda::mr types for host and host-device resources, while still using rmm types for device async resources. Changing the latter would break a lot of cuVS and is not needed - rmm will eventually fully converge to cuda::mr anyway.

Breaking changes

  • Rename container policies
  • Reuse of a single host_container for the three types of resources.
  • Switch to using cuda::mr::any_synchronous_resource from std::pmr::memory_resource

The effect of this changes should be limited, because the policies are hidden behind the mdarray templates and synonyms and the std::pmr::memory_resource was introduced recently and haven't been used much.

@achirkin achirkin self-assigned this Feb 27, 2026
@achirkin achirkin requested review from a team as code owners February 27, 2026 16:43
@achirkin achirkin added the DO NOT MERGE Hold off on merging; see PR for details label Feb 27, 2026
@achirkin achirkin added feature request New feature or request breaking Breaking change labels Feb 27, 2026
@achirkin achirkin removed the DO NOT MERGE Hold off on merging; see PR for details label Mar 4, 2026
template <typename T>
struct host_container {
template <typename T, typename MR>
#ifdef __cpp_concepts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think RAFT is using C++20 now so it should be safe to use requires without the #ifdef guard?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately some components of cuvs still use C++17 and it breaks if I remove the #ifdef in this header. I figured, I'd keep it here to keep cuvs passing CI without changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should get cuVS updated to C++20, RMM will be requiring C++20 soon.

* Provides CUDA unified (managed) memory accessible from both host and device.
* Uses synchronous allocation (no stream). Binds to raft::mr::host_device_resource_ref.
*/
class managed_memory_resource {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is implemented in CCCL already. Please do not introduce a new implementation of this since one already exists.

https://nvidia.github.io/cccl/unstable/libcudacxx/runtime/memory_pools.html#cuda-managed-memory-pool
https://nvidia.github.io/cccl/unstable/libcudacxx/runtime/legacy_resources.html#cuda-mr-legacy-managed-memory-resource

Use cuda::mr::legacy_managed_memory_resource on CUDA 12 and cuda::managed_memory_pool on CUDA 13 (it's considerably faster). Maybe write a factory that returns the correct resource type for your CUDA version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer! Really nice, I replaced it with the cuda::mr::legacy_managed_memory_resource and it just worked with no other modifications. I'd prefer to keep the legacy resource for now to keep exactly the same behavior in cuVS as before this PR.
The user is be able to replace it with the CUDA 13 pool-based resource even now via raft::resource::managed_memory_resource, but we can also make it the default later.

@achirkin
Copy link
Contributor Author

achirkin commented Mar 4, 2026

The follow up and motivation: tracking all memory allocations #2973
Here's the changeset of that PR without the content of the current PR: achirkin#3

achirkin added a commit to rapidsai/cuvs that referenced this pull request Mar 5, 2026
achirkin added a commit to achirkin/cuml that referenced this pull request Mar 5, 2026
@achirkin
Copy link
Contributor Author

achirkin commented Mar 5, 2026

Testing the breaking changes:

@achirkin achirkin mentioned this pull request Mar 5, 2026

class managed_memory_resource_factory : public resource_factory {
public:
managed_memory_resource_factory() : mr_(cuda::mr::legacy_managed_memory_resource{}) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you said it's out of scope for now, but I recommend a follow-up PR that uses the new managed pool on CUDA 13+. It's a worthy performance boost.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I've opened an issue here #2976


struct managed_container_policy {
using element_type = ElementType;
using container_type = host_container<element_type, raft::mr::host_device_resource_ref>;
Copy link
Contributor

@bdice bdice Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to be aware of: It is possible for memory resources to be host-accessible and device-accessible but not have that known statically. For example, systems with HMM or ATS have device-accessibility for memory allocated with malloc. However, that can't be known by the type alone. You have to query the accessibility at runtime.

Some systems like DGX Spark with integrated memory may perform better with a host-device accessible resource that isn't a managed memory resource (but that would require some system knowledge at runtime).

All this to say, someday we might want to refactor this to use cuda::mr::synchronous_resource_ref<> and check the accessibility at runtime rather than using cuda::mr::synchronous_resource_ref<cuda::mr::host_accessible, cuda::mr::device_accessible> which requires that accessibility to be statically known.

Copy link
Contributor Author

@achirkin achirkin Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's a very important point for cuVS - we've been experimenting using various memory types on Grace Hopper and DGX Spark. I actually hoped that I could use the new resources (defined in this PR as they are right now) to do more experiments by switching the memory resources.

I think, the naming goes against the intention a little bit since we decouple the memory resources, raft resource handles, and the containers (mdarrays).
On the algorithm implementation side:

  • When I'm using raft::managed_mdarray and raft::get_managed_memory_resource_ref in an algorithm code, I mean more of "some (probably paged, smart) memory resource with guaranteed host and device access" rather than specifically cudaMallocManaged.
  • Same for the pinned - "some (probably low-level, not-paged) memory resource with guaranteed host and device access and limited support for host-device atomics".

These two allow me to implement atomic synchronization between the device and host, reduce copy overheads, or just simplify the code a little bit. I don't need/want to query the resource properties at runtime for this.

On the user side (e.g. in cuvs benchmarks), I want be able to configure the program for the target device: query the device properties, check whether ATS is available, select the most appropriate resource that fits the bill. Only then wrap it into cuda::mr::synchronous_resource_ref<cuda::mr::host_accessible, cuda::mr::device_accessible>, pass using raft::set_managed_memory_resource, and benefit from the improved performance.

achirkin

This comment was marked as resolved.

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Artem for this PR, it looks good to me!

@achirkin
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 8d8e1ef into rapidsai:main Mar 14, 2026
79 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Breaking change feature request New feature or request

Development

Successfully merging this pull request may close these issues.

3 participants