-
Notifications
You must be signed in to change notification settings - Fork 245
Description
Part of #2011
Problem
All 8 base (non-adaptor) memory resources in RMM currently inherit from device_memory_resource and implement do_allocate/do_deallocate. When constructing a device_resource_ref from any of them, the SFINAE guard in cccl_adaptors.hpp excludes device_memory_resource-derived types from the direct CCCL path, forcing all ref construction through device_memory_resource_view.
The adaptors (arena, logging, limiting, etc.) have already been converted to a dual-inheritance pattern (device_memory_resource + cuda::mr::shared_resource<Impl>). The base resources have not been touched.
Additionally, most base resources lack their own get_property friend functions — they currently satisfy device_accessible only through the device_memory_resource base class. Once they provide their own CCCL allocate/deallocate methods, they need their own get_property friends to satisfy the CCCL concept independently.
Goal
Each base resource provides its own allocate/deallocate/allocate_sync/deallocate_sync methods (hiding the device_memory_resource base class versions) so that when someone holds a concrete type (e.g. cuda_memory_resource&), the CCCL concept is satisfied natively — no virtual dispatch, no device_memory_resource_view.
device_memory_resource inheritance is kept during this phase for backward compatibility. Removal of that inheritance is a separate future step.
Stateless resources implement the CCCL interface directly on the class. They do NOT use cuda::shared_resource — there's no shared state to manage.
Stateful, non-copyable resources use the cuda::mr::shared_resource<Impl> pattern, with _impl classes in detail/ headers (consistent with the adaptor convention).
Resources In Scope
Stateless (direct CCCL interface on the class)
| Resource | File | Properties | Notes |
|---|---|---|---|
cuda_memory_resource |
mr/cuda_memory_resource.hpp |
device_accessible |
cudaMalloc/cudaFree, copyable. Needs get_property friend. |
managed_memory_resource |
mr/managed_memory_resource.hpp |
device_accessible, host_accessible |
cudaMallocManaged/cudaFree, copyable. Needs both get_property friends. Managed memory is host-accessible, so add host_accessible. |
pinned_host_memory_resource |
mr/pinned_host_memory_resource.hpp |
device_accessible, host_accessible |
cudaHostAlloc/cudaFreeHost, copyable. Already has both get_property friends and static_assert. |
cuda_async_view_memory_resource |
mr/cuda_async_view_memory_resource.hpp |
device_accessible |
Non-owning view of cudaMemPool_t, copyable. Needs get_property friend. |
system_memory_resource |
mr/system_memory_resource.hpp |
device_accessible, host_accessible |
operator new/delete with SAM, copyable. Already has both get_property friends and static_assert. |
Stateful, non-copyable (need cuda::shared_resource<Impl>)
| Resource | File | Properties | Notes |
|---|---|---|---|
cuda_async_memory_resource |
mr/cuda_async_memory_resource.hpp |
device_accessible |
Owns CUDA pool (cudaMemPoolCreate/Destroy), non-copyable. Delegates to internal cuda_async_view_memory_resource. Needs get_property friend. Extract _impl to detail/. |
cuda_async_managed_memory_resource |
mr/cuda_async_managed_memory_resource.hpp |
device_accessible, host_accessible |
Non-copyable, delegates to internal cuda_async_view_memory_resource. Managed memory is host-accessible, so add host_accessible. Extract _impl to detail/. |
sam_headroom_memory_resource |
mr/sam_headroom_memory_resource.hpp |
device_accessible, host_accessible |
Non-copyable, holds system_memory_resource + headroom_. Needs get_property friends. Extract _impl to detail/. |
Target Pattern
Stateless resource (e.g. cuda_memory_resource)
class cuda_memory_resource final : public device_memory_resource {
public:
// -- CCCL memory resource interface (hides device_memory_resource versions) --
void* allocate(cuda::stream_ref stream,
std::size_t bytes,
std::size_t alignment = cuda::mr::default_cuda_malloc_alignment)
{ /* direct cudaMalloc impl */ }
void deallocate(cuda::stream_ref stream,
void* ptr,
std::size_t bytes,
std::size_t alignment = cuda::mr::default_cuda_malloc_alignment) noexcept
{ /* direct cudaFree impl */ }
void* allocate_sync(std::size_t bytes,
std::size_t alignment = cuda::mr::default_cuda_malloc_alignment)
{ /* direct cudaMalloc impl */ }
void deallocate_sync(void* ptr,
std::size_t bytes,
std::size_t alignment = cuda::mr::default_cuda_malloc_alignment) noexcept
{ /* direct cudaFree impl */ }
bool operator==(cuda_memory_resource const&) const noexcept { return true; }
bool operator!=(cuda_memory_resource const&) const noexcept { return false; }
friend void get_property(cuda_memory_resource const&,
cuda::mr::device_accessible) noexcept {}
private:
// -- Legacy device_memory_resource overrides (delegates to CCCL interface) --
void* do_allocate(std::size_t bytes, cuda_stream_view stream) override
{ return allocate(stream, bytes); }
void do_deallocate(void* ptr, std::size_t bytes, cuda_stream_view stream) noexcept override
{ deallocate(stream, ptr, bytes); }
bool do_is_equal(device_memory_resource const& other) const noexcept override
{ return dynamic_cast<cuda_memory_resource const*>(&other) != nullptr; }
};
static_assert(rmm::detail::polyfill::async_resource_with<
cuda_memory_resource, cuda::mr::device_accessible>);Key points:
- The CCCL
allocate/deallocatemethods usecuda::stream_ref(notcuda_stream_view). - The default alignment is
cuda::mr::default_cuda_malloc_alignment(256 bytes, defined in CCCLproperties.h). do_allocate/do_deallocatedelegate TO the CCCL methods (single implementation).- Equality operators are defined on the concrete type.
get_propertyfriend functions declare accessibility properties.
Stateful resource (e.g. cuda_async_memory_resource)
Follow the same pattern as the already-converted adaptors:
- Extract an
_implclass in adetail/header with the CCCL interface. - Outer class uses dual inheritance:
device_memory_resource+cuda::mr::shared_resource<Impl>. do_allocate/do_deallocatedelegate toshared_base::allocate._implfiles indetail/(e.g.detail/cuda_async_memory_resource_impl.hpp).
Ordering
Phase 1 (stateless) must be completed before phase 2 (stateful), because:
cuda_async_memory_resourcedelegates tocuda_async_view_memory_resourcecuda_async_managed_memory_resourcedelegates tocuda_async_view_memory_resourcesam_headroom_memory_resourcedelegates tosystem_memory_resource
Once the stateless resources have native CCCL interfaces, the stateful _impl classes can forward to them directly.
Per-Resource Checklist
For each resource, the conversion requires:
- Add
allocate(cuda::stream_ref, size_t, size_t)(hides base class version) - Add
deallocate(cuda::stream_ref, void*, size_t, size_t)(hides base class version) - Add
allocate_sync(size_t, size_t)(hides base class version) - Add
deallocate_sync(void*, size_t, size_t)(hides base class version) - Add
get_propertyfriend function(s) for the correct properties - Add
operator==/operator!=on the concrete type - Modify
do_allocate/do_deallocateto delegate to the new CCCL methods - Add
static_assertforasync_resource_with<R, Properties...> - For stateful: extract
_impltodetail/, addshared_resource<Impl>inheritance
Resources
-
cuda_memory_resource— stateless,device_accessible -
managed_memory_resource— stateless,device_accessible+host_accessible -
pinned_host_memory_resource— stateless,device_accessible+host_accessible(already hasget_property+static_assert) -
cuda_async_view_memory_resource— stateless,device_accessible -
system_memory_resource— stateless,device_accessible+host_accessible(already hasget_property+static_assert) -
cuda_async_memory_resource— stateful,device_accessible, extract_impl -
cuda_async_managed_memory_resource— stateful,device_accessible+host_accessible, extract_impl -
sam_headroom_memory_resource— stateful,device_accessible+host_accessible, extract_impl
Validation
static_assertthat each resource satisfiesasync_resource_with<R, properties...>.- Verify that
any_resourcecan store each resource directly (withoutdevice_memory_resource_view). - Run full C++ and Python test suites.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status