Unify memory resources#2968
Conversation
…nitializing make_*_scalar overloads
…sed host_resource and host_device_resource
…aw CCCL references
…urrent_device_resource()
| template <typename T> | ||
| struct host_container { | ||
| template <typename T, typename MR> | ||
| #ifdef __cpp_concepts |
There was a problem hiding this comment.
I think RAFT is using C++20 now so it should be safe to use requires without the #ifdef guard?
There was a problem hiding this comment.
Unfortunately some components of cuvs still use C++17 and it breaks if I remove the #ifdef in this header. I figured, I'd keep it here to keep cuvs passing CI without changes.
There was a problem hiding this comment.
We should get cuVS updated to C++20, RMM will be requiring C++20 soon.
| * Provides CUDA unified (managed) memory accessible from both host and device. | ||
| * Uses synchronous allocation (no stream). Binds to raft::mr::host_device_resource_ref. | ||
| */ | ||
| class managed_memory_resource { |
There was a problem hiding this comment.
This is implemented in CCCL already. Please do not introduce a new implementation of this since one already exists.
https://nvidia.github.io/cccl/unstable/libcudacxx/runtime/memory_pools.html#cuda-managed-memory-pool
https://nvidia.github.io/cccl/unstable/libcudacxx/runtime/legacy_resources.html#cuda-mr-legacy-managed-memory-resource
Use cuda::mr::legacy_managed_memory_resource on CUDA 12 and cuda::managed_memory_pool on CUDA 13 (it's considerably faster). Maybe write a factory that returns the correct resource type for your CUDA version.
There was a problem hiding this comment.
Thanks for the pointer! Really nice, I replaced it with the cuda::mr::legacy_managed_memory_resource and it just worked with no other modifications. I'd prefer to keep the legacy resource for now to keep exactly the same behavior in cuVS as before this PR.
The user is be able to replace it with the CUDA 13 pool-based resource even now via raft::resource::managed_memory_resource, but we can also make it the default later.
|
The follow up and motivation: tracking all memory allocations #2973 |
Testing cuvs CI against rapidsai/raft#2968
Testing cuvs CI against rapidsai/raft#2968
|
Testing the breaking changes:
|
|
|
||
| class managed_memory_resource_factory : public resource_factory { | ||
| public: | ||
| managed_memory_resource_factory() : mr_(cuda::mr::legacy_managed_memory_resource{}) {} |
There was a problem hiding this comment.
I know you said it's out of scope for now, but I recommend a follow-up PR that uses the new managed pool on CUDA 13+. It's a worthy performance boost.
|
|
||
| struct managed_container_policy { | ||
| using element_type = ElementType; | ||
| using container_type = host_container<element_type, raft::mr::host_device_resource_ref>; |
There was a problem hiding this comment.
Something to be aware of: It is possible for memory resources to be host-accessible and device-accessible but not have that known statically. For example, systems with HMM or ATS have device-accessibility for memory allocated with malloc. However, that can't be known by the type alone. You have to query the accessibility at runtime.
Some systems like DGX Spark with integrated memory may perform better with a host-device accessible resource that isn't a managed memory resource (but that would require some system knowledge at runtime).
All this to say, someday we might want to refactor this to use cuda::mr::synchronous_resource_ref<> and check the accessibility at runtime rather than using cuda::mr::synchronous_resource_ref<cuda::mr::host_accessible, cuda::mr::device_accessible> which requires that accessibility to be statically known.
There was a problem hiding this comment.
Thanks, that's a very important point for cuVS - we've been experimenting using various memory types on Grace Hopper and DGX Spark. I actually hoped that I could use the new resources (defined in this PR as they are right now) to do more experiments by switching the memory resources.
I think, the naming goes against the intention a little bit since we decouple the memory resources, raft resource handles, and the containers (mdarrays).
On the algorithm implementation side:
- When I'm using
raft::managed_mdarrayandraft::get_managed_memory_resource_refin an algorithm code, I mean more of "some (probably paged, smart) memory resource with guaranteed host and device access" rather than specificallycudaMallocManaged. - Same for the pinned - "some (probably low-level, not-paged) memory resource with guaranteed host and device access and limited support for host-device atomics".
These two allow me to implement atomic synchronization between the device and host, reduce copy overheads, or just simplify the code a little bit. I don't need/want to query the resource properties at runtime for this.
On the user side (e.g. in cuvs benchmarks), I want be able to configure the program for the target device: query the device properties, check whether ATS is available, select the most appropriate resource that fits the bill. Only then wrap it into cuda::mr::synchronous_resource_ref<cuda::mr::host_accessible, cuda::mr::device_accessible>, pass using raft::set_managed_memory_resource, and benefit from the improved performance.
tfeher
left a comment
There was a problem hiding this comment.
Thanks Artem for this PR, it looks good to me!
|
/merge |
Use
cuda::mr::any_synchronous_resourcefor host, pinned, and managed resource types and give the user explicit control for host, pinned, and managed resources.New
raft::resource::managed_memory_resourceandraft::resource::pinned_memory_resourceare passed to managed and pinned mdarrays during construction via corresponding container policies. This allows the user to replace/modify these resources, for example, to add logging or memory pooling.raft::mr::get_default_host_resourceandraft::mr::set_default_host_resourcecan be used by the user to alter the default host resource the same way. It is not stored inraft::resourceshandle like the other two for two reasons:raft::make_host_mdarrayoverloads that do not takeraft::resourcesas an argument (many instances across raft and cuvs).Changed
raft::mr::host_resource_refandraft::mr::host_device_resource_reffor the non-owning semantics (defined ascuda::mr::synchronous_resource_refwith appropriate access attributes)raft::host_resourceandraft::host_device_resourcefor owning semantics (defined ascuda::mr::any_synchronous_resourcewith appropriate access attributes)With these changes, raft fully switches to
cuda::mrtypes for host and host-device resources, while still usingrmmtypes for device async resources. Changing the latter would break a lot of cuVS and is not needed -rmmwill eventually fully converge tocuda::mranyway.Breaking changes
host_containerfor the three types of resources.cuda::mr::any_synchronous_resourcefromstd::pmr::memory_resourceThe effect of this changes should be limited, because the policies are hidden behind the mdarray templates and synonyms and the
std::pmr::memory_resourcewas introduced recently and haven't been used much.