Skip to content

[FEA] Migrate base memory resources to native CCCL interface #2287

@bdice

Description

@bdice

Part of #2011

Problem

All 8 base (non-adaptor) memory resources in RMM currently inherit from device_memory_resource and implement do_allocate/do_deallocate. When constructing a device_resource_ref from any of them, the SFINAE guard in cccl_adaptors.hpp excludes device_memory_resource-derived types from the direct CCCL path, forcing all ref construction through device_memory_resource_view.

The adaptors (arena, logging, limiting, etc.) have already been converted to a dual-inheritance pattern (device_memory_resource + cuda::mr::shared_resource<Impl>). The base resources have not been touched.

Additionally, most base resources lack their own get_property friend functions — they currently satisfy device_accessible only through the device_memory_resource base class. Once they provide their own CCCL allocate/deallocate methods, they need their own get_property friends to satisfy the CCCL concept independently.

Goal

Each base resource provides its own allocate/deallocate/allocate_sync/deallocate_sync methods (hiding the device_memory_resource base class versions) so that when someone holds a concrete type (e.g. cuda_memory_resource&), the CCCL concept is satisfied natively — no virtual dispatch, no device_memory_resource_view.

device_memory_resource inheritance is kept during this phase for backward compatibility. Removal of that inheritance is a separate future step.

Stateless resources implement the CCCL interface directly on the class. They do NOT use cuda::shared_resource — there's no shared state to manage.

Stateful, non-copyable resources use the cuda::mr::shared_resource<Impl> pattern, with _impl classes in detail/ headers (consistent with the adaptor convention).

Resources In Scope

Stateless (direct CCCL interface on the class)

Resource File Properties Notes
cuda_memory_resource mr/cuda_memory_resource.hpp device_accessible cudaMalloc/cudaFree, copyable. Needs get_property friend.
managed_memory_resource mr/managed_memory_resource.hpp device_accessible, host_accessible cudaMallocManaged/cudaFree, copyable. Needs both get_property friends. Managed memory is host-accessible, so add host_accessible.
pinned_host_memory_resource mr/pinned_host_memory_resource.hpp device_accessible, host_accessible cudaHostAlloc/cudaFreeHost, copyable. Already has both get_property friends and static_assert.
cuda_async_view_memory_resource mr/cuda_async_view_memory_resource.hpp device_accessible Non-owning view of cudaMemPool_t, copyable. Needs get_property friend.
system_memory_resource mr/system_memory_resource.hpp device_accessible, host_accessible operator new/delete with SAM, copyable. Already has both get_property friends and static_assert.

Stateful, non-copyable (need cuda::shared_resource<Impl>)

Resource File Properties Notes
cuda_async_memory_resource mr/cuda_async_memory_resource.hpp device_accessible Owns CUDA pool (cudaMemPoolCreate/Destroy), non-copyable. Delegates to internal cuda_async_view_memory_resource. Needs get_property friend. Extract _impl to detail/.
cuda_async_managed_memory_resource mr/cuda_async_managed_memory_resource.hpp device_accessible, host_accessible Non-copyable, delegates to internal cuda_async_view_memory_resource. Managed memory is host-accessible, so add host_accessible. Extract _impl to detail/.
sam_headroom_memory_resource mr/sam_headroom_memory_resource.hpp device_accessible, host_accessible Non-copyable, holds system_memory_resource + headroom_. Needs get_property friends. Extract _impl to detail/.

Target Pattern

Stateless resource (e.g. cuda_memory_resource)

class cuda_memory_resource final : public device_memory_resource {
 public:
  // -- CCCL memory resource interface (hides device_memory_resource versions) --

  void* allocate(cuda::stream_ref stream,
                 std::size_t bytes,
                 std::size_t alignment = cuda::mr::default_cuda_malloc_alignment)
  { /* direct cudaMalloc impl */ }

  void deallocate(cuda::stream_ref stream,
                  void* ptr,
                  std::size_t bytes,
                  std::size_t alignment = cuda::mr::default_cuda_malloc_alignment) noexcept
  { /* direct cudaFree impl */ }

  void* allocate_sync(std::size_t bytes,
                      std::size_t alignment = cuda::mr::default_cuda_malloc_alignment)
  { /* direct cudaMalloc impl */ }

  void deallocate_sync(void* ptr,
                       std::size_t bytes,
                       std::size_t alignment = cuda::mr::default_cuda_malloc_alignment) noexcept
  { /* direct cudaFree impl */ }

  bool operator==(cuda_memory_resource const&) const noexcept { return true; }
  bool operator!=(cuda_memory_resource const&) const noexcept { return false; }

  friend void get_property(cuda_memory_resource const&,
                           cuda::mr::device_accessible) noexcept {}

 private:
  // -- Legacy device_memory_resource overrides (delegates to CCCL interface) --
  void* do_allocate(std::size_t bytes, cuda_stream_view stream) override
  { return allocate(stream, bytes); }

  void do_deallocate(void* ptr, std::size_t bytes, cuda_stream_view stream) noexcept override
  { deallocate(stream, ptr, bytes); }

  bool do_is_equal(device_memory_resource const& other) const noexcept override
  { return dynamic_cast<cuda_memory_resource const*>(&other) != nullptr; }
};

static_assert(rmm::detail::polyfill::async_resource_with<
              cuda_memory_resource, cuda::mr::device_accessible>);

Key points:

  • The CCCL allocate/deallocate methods use cuda::stream_ref (not cuda_stream_view).
  • The default alignment is cuda::mr::default_cuda_malloc_alignment (256 bytes, defined in CCCL properties.h).
  • do_allocate/do_deallocate delegate TO the CCCL methods (single implementation).
  • Equality operators are defined on the concrete type.
  • get_property friend functions declare accessibility properties.

Stateful resource (e.g. cuda_async_memory_resource)

Follow the same pattern as the already-converted adaptors:

  • Extract an _impl class in a detail/ header with the CCCL interface.
  • Outer class uses dual inheritance: device_memory_resource + cuda::mr::shared_resource<Impl>.
  • do_allocate/do_deallocate delegate to shared_base::allocate.
  • _impl files in detail/ (e.g. detail/cuda_async_memory_resource_impl.hpp).

Ordering

Phase 1 (stateless) must be completed before phase 2 (stateful), because:

  • cuda_async_memory_resource delegates to cuda_async_view_memory_resource
  • cuda_async_managed_memory_resource delegates to cuda_async_view_memory_resource
  • sam_headroom_memory_resource delegates to system_memory_resource

Once the stateless resources have native CCCL interfaces, the stateful _impl classes can forward to them directly.

Per-Resource Checklist

For each resource, the conversion requires:

  • Add allocate(cuda::stream_ref, size_t, size_t) (hides base class version)
  • Add deallocate(cuda::stream_ref, void*, size_t, size_t) (hides base class version)
  • Add allocate_sync(size_t, size_t) (hides base class version)
  • Add deallocate_sync(void*, size_t, size_t) (hides base class version)
  • Add get_property friend function(s) for the correct properties
  • Add operator== / operator!= on the concrete type
  • Modify do_allocate/do_deallocate to delegate to the new CCCL methods
  • Add static_assert for async_resource_with<R, Properties...>
  • For stateful: extract _impl to detail/, add shared_resource<Impl> inheritance

Resources

  • cuda_memory_resource — stateless, device_accessible
  • managed_memory_resource — stateless, device_accessible + host_accessible
  • pinned_host_memory_resource — stateless, device_accessible + host_accessible (already has get_property + static_assert)
  • cuda_async_view_memory_resource — stateless, device_accessible
  • system_memory_resource — stateless, device_accessible + host_accessible (already has get_property + static_assert)
  • cuda_async_memory_resource — stateful, device_accessible, extract _impl
  • cuda_async_managed_memory_resource — stateful, device_accessible + host_accessible, extract _impl
  • sam_headroom_memory_resource — stateful, device_accessible + host_accessible, extract _impl

Validation

  • static_assert that each resource satisfies async_resource_with<R, properties...>.
  • Verify that any_resource can store each resource directly (without device_memory_resource_view).
  • Run full C++ and Python test suites.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions