[FEA] Migrate base memory resources to native CCCL interface

Part of https://github.com/rapidsai/rmm/issues/2011

## Problem

All 8 base (non-adaptor) memory resources in RMM currently inherit from `device_memory_resource` and implement `do_allocate`/`do_deallocate`. When constructing a `device_resource_ref` from any of them, the SFINAE guard in `cccl_adaptors.hpp` excludes `device_memory_resource`-derived types from the direct CCCL path, forcing all ref construction through `device_memory_resource_view`.

The adaptors (arena, logging, limiting, etc.) have already been converted to a dual-inheritance pattern (`device_memory_resource` + `cuda::mr::shared_resource<Impl>`). The base resources have not been touched.

Additionally, most base resources lack their own `get_property` friend functions — they currently satisfy `device_accessible` only through the `device_memory_resource` base class. Once they provide their own CCCL `allocate`/`deallocate` methods, they need their own `get_property` friends to satisfy the CCCL concept independently.

## Goal

Each base resource provides its own `allocate`/`deallocate`/`allocate_sync`/`deallocate_sync` methods (hiding the `device_memory_resource` base class versions) so that when someone holds a concrete type (e.g. `cuda_memory_resource&`), the CCCL concept is satisfied natively — no virtual dispatch, no `device_memory_resource_view`.

`device_memory_resource` inheritance is kept during this phase for backward compatibility. Removal of that inheritance is a separate future step.

Stateless resources implement the CCCL interface directly on the class. They do NOT use `cuda::shared_resource` — there's no shared state to manage.

Stateful, non-copyable resources use the `cuda::mr::shared_resource<Impl>` pattern, with `_impl` classes in `detail/` headers (consistent with the adaptor convention).

## Resources In Scope

### Stateless (direct CCCL interface on the class)

| Resource | File | Properties | Notes |
|---|---|---|---|
| `cuda_memory_resource` | `mr/cuda_memory_resource.hpp` | `device_accessible` | `cudaMalloc`/`cudaFree`, copyable. Needs `get_property` friend. |
| `managed_memory_resource` | `mr/managed_memory_resource.hpp` | `device_accessible`, `host_accessible` | `cudaMallocManaged`/`cudaFree`, copyable. Needs both `get_property` friends. Managed memory is host-accessible, so add `host_accessible`. |
| `pinned_host_memory_resource` | `mr/pinned_host_memory_resource.hpp` | `device_accessible`, `host_accessible` | `cudaHostAlloc`/`cudaFreeHost`, copyable. Already has both `get_property` friends and `static_assert`. |
| `cuda_async_view_memory_resource` | `mr/cuda_async_view_memory_resource.hpp` | `device_accessible` | Non-owning view of `cudaMemPool_t`, copyable. Needs `get_property` friend. |
| `system_memory_resource` | `mr/system_memory_resource.hpp` | `device_accessible`, `host_accessible` | `operator new`/`delete` with SAM, copyable. Already has both `get_property` friends and `static_assert`. |

### Stateful, non-copyable (need `cuda::shared_resource<Impl>`)

| Resource | File | Properties | Notes |
|---|---|---|---|
| `cuda_async_memory_resource` | `mr/cuda_async_memory_resource.hpp` | `device_accessible` | Owns CUDA pool (`cudaMemPoolCreate`/`Destroy`), non-copyable. Delegates to internal `cuda_async_view_memory_resource`. Needs `get_property` friend. Extract `_impl` to `detail/`. |
| `cuda_async_managed_memory_resource` | `mr/cuda_async_managed_memory_resource.hpp` | `device_accessible`, `host_accessible` | Non-copyable, delegates to internal `cuda_async_view_memory_resource`. Managed memory is host-accessible, so add `host_accessible`. Extract `_impl` to `detail/`. |
| `sam_headroom_memory_resource` | `mr/sam_headroom_memory_resource.hpp` | `device_accessible`, `host_accessible` | Non-copyable, holds `system_memory_resource` + `headroom_`. Needs `get_property` friends. Extract `_impl` to `detail/`. |

## Target Pattern

### Stateless resource (e.g. `cuda_memory_resource`)

```cpp
class cuda_memory_resource final : public device_memory_resource {
 public:
  // -- CCCL memory resource interface (hides device_memory_resource versions) --

  void* allocate(cuda::stream_ref stream,
                 std::size_t bytes,
                 std::size_t alignment = cuda::mr::default_cuda_malloc_alignment)
  { /* direct cudaMalloc impl */ }

  void deallocate(cuda::stream_ref stream,
                  void* ptr,
                  std::size_t bytes,
                  std::size_t alignment = cuda::mr::default_cuda_malloc_alignment) noexcept
  { /* direct cudaFree impl */ }

  void* allocate_sync(std::size_t bytes,
                      std::size_t alignment = cuda::mr::default_cuda_malloc_alignment)
  { /* direct cudaMalloc impl */ }

  void deallocate_sync(void* ptr,
                       std::size_t bytes,
                       std::size_t alignment = cuda::mr::default_cuda_malloc_alignment) noexcept
  { /* direct cudaFree impl */ }

  bool operator==(cuda_memory_resource const&) const noexcept { return true; }
  bool operator!=(cuda_memory_resource const&) const noexcept { return false; }

  friend void get_property(cuda_memory_resource const&,
                           cuda::mr::device_accessible) noexcept {}

 private:
  // -- Legacy device_memory_resource overrides (delegates to CCCL interface) --
  void* do_allocate(std::size_t bytes, cuda_stream_view stream) override
  { return allocate(stream, bytes); }

  void do_deallocate(void* ptr, std::size_t bytes, cuda_stream_view stream) noexcept override
  { deallocate(stream, ptr, bytes); }

  bool do_is_equal(device_memory_resource const& other) const noexcept override
  { return dynamic_cast<cuda_memory_resource const*>(&other) != nullptr; }
};

static_assert(rmm::detail::polyfill::async_resource_with<
              cuda_memory_resource, cuda::mr::device_accessible>);
```

Key points:
- The CCCL `allocate`/`deallocate` methods use `cuda::stream_ref` (not `cuda_stream_view`).
- The default alignment is `cuda::mr::default_cuda_malloc_alignment` (256 bytes, defined in CCCL `properties.h`).
- `do_allocate`/`do_deallocate` delegate TO the CCCL methods (single implementation).
- Equality operators are defined on the concrete type.
- `get_property` friend functions declare accessibility properties.

### Stateful resource (e.g. `cuda_async_memory_resource`)

Follow the same pattern as the already-converted adaptors:
- Extract an `_impl` class in a `detail/` header with the CCCL interface.
- Outer class uses dual inheritance: `device_memory_resource` + `cuda::mr::shared_resource<Impl>`.
- `do_allocate`/`do_deallocate` delegate to `shared_base::allocate`.
- `_impl` files in `detail/` (e.g. `detail/cuda_async_memory_resource_impl.hpp`).

## Ordering

Phase 1 (stateless) must be completed before phase 2 (stateful), because:
- `cuda_async_memory_resource` delegates to `cuda_async_view_memory_resource`
- `cuda_async_managed_memory_resource` delegates to `cuda_async_view_memory_resource`
- `sam_headroom_memory_resource` delegates to `system_memory_resource`

Once the stateless resources have native CCCL interfaces, the stateful `_impl` classes can forward to them directly.

## Per-Resource Checklist

For each resource, the conversion requires:

- [ ] Add `allocate(cuda::stream_ref, size_t, size_t)` (hides base class version)
- [ ] Add `deallocate(cuda::stream_ref, void*, size_t, size_t)` (hides base class version)
- [ ] Add `allocate_sync(size_t, size_t)` (hides base class version)
- [ ] Add `deallocate_sync(void*, size_t, size_t)` (hides base class version)
- [ ] Add `get_property` friend function(s) for the correct properties
- [ ] Add `operator==` / `operator!=` on the concrete type
- [ ] Modify `do_allocate`/`do_deallocate` to delegate to the new CCCL methods
- [ ] Add `static_assert` for `async_resource_with<R, Properties...>`
- [ ] For stateful: extract `_impl` to `detail/`, add `shared_resource<Impl>` inheritance

### Resources

- [ ] `cuda_memory_resource` — stateless, `device_accessible`
- [ ] `managed_memory_resource` — stateless, `device_accessible` + `host_accessible`
- [ ] `pinned_host_memory_resource` — stateless, `device_accessible` + `host_accessible` (already has `get_property` + `static_assert`)
- [ ] `cuda_async_view_memory_resource` — stateless, `device_accessible`
- [ ] `system_memory_resource` — stateless, `device_accessible` + `host_accessible` (already has `get_property` + `static_assert`)
- [ ] `cuda_async_memory_resource` — stateful, `device_accessible`, extract `_impl`
- [ ] `cuda_async_managed_memory_resource` — stateful, `device_accessible` + `host_accessible`, extract `_impl`
- [ ] `sam_headroom_memory_resource` — stateful, `device_accessible` + `host_accessible`, extract `_impl`

## Validation

- `static_assert` that each resource satisfies `async_resource_with<R, properties...>`.
- Verify that `any_resource` can store each resource directly (without `device_memory_resource_view`).
- Run full C++ and Python test suites.

Resource	File	Properties	Notes
`cuda_memory_resource`	`mr/cuda_memory_resource.hpp`	`device_accessible`	`cudaMalloc`/`cudaFree`, copyable. Needs `get_property` friend.
`managed_memory_resource`	`mr/managed_memory_resource.hpp`	`device_accessible`, `host_accessible`	`cudaMallocManaged`/`cudaFree`, copyable. Needs both `get_property` friends. Managed memory is host-accessible, so add `host_accessible`.
`pinned_host_memory_resource`	`mr/pinned_host_memory_resource.hpp`	`device_accessible`, `host_accessible`	`cudaHostAlloc`/`cudaFreeHost`, copyable. Already has both `get_property` friends and `static_assert`.
`cuda_async_view_memory_resource`	`mr/cuda_async_view_memory_resource.hpp`	`device_accessible`	Non-owning view of `cudaMemPool_t`, copyable. Needs `get_property` friend.
`system_memory_resource`	`mr/system_memory_resource.hpp`	`device_accessible`, `host_accessible`	`operator new`/`delete` with SAM, copyable. Already has both `get_property` friends and `static_assert`.

Resource	File	Properties	Notes
`cuda_async_memory_resource`	`mr/cuda_async_memory_resource.hpp`	`device_accessible`	Owns CUDA pool (`cudaMemPoolCreate`/`Destroy`), non-copyable. Delegates to internal `cuda_async_view_memory_resource`. Needs `get_property` friend. Extract `_impl` to `detail/`.
`cuda_async_managed_memory_resource`	`mr/cuda_async_managed_memory_resource.hpp`	`device_accessible`, `host_accessible`	Non-copyable, delegates to internal `cuda_async_view_memory_resource`. Managed memory is host-accessible, so add `host_accessible`. Extract `_impl` to `detail/`.
`sam_headroom_memory_resource`	`mr/sam_headroom_memory_resource.hpp`	`device_accessible`, `host_accessible`	Non-copyable, holds `system_memory_resource` + `headroom_`. Needs `get_property` friends. Extract `_impl` to `detail/`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Migrate base memory resources to native CCCL interface #2287

Problem

Goal

Resources In Scope

Stateless (direct CCCL interface on the class)

Stateful, non-copyable (need `cuda::shared_resource<Impl>`)

Target Pattern

Stateless resource (e.g. `cuda_memory_resource`)

Stateful resource (e.g. `cuda_async_memory_resource`)

Ordering

Per-Resource Checklist

Resources

Validation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEA] Migrate base memory resources to native CCCL interface #2287

Description

Problem

Goal

Resources In Scope

Stateless (direct CCCL interface on the class)

Stateful, non-copyable (need cuda::shared_resource<Impl>)

Target Pattern

Stateless resource (e.g. cuda_memory_resource)

Stateful resource (e.g. cuda_async_memory_resource)

Ordering

Per-Resource Checklist

Resources

Validation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Stateful, non-copyable (need `cuda::shared_resource<Impl>`)

Stateless resource (e.g. `cuda_memory_resource`)

Stateful resource (e.g. `cuda_async_memory_resource`)