-
Notifications
You must be signed in to change notification settings - Fork 244
Open
Labels
feature requestNew feature or requestNew feature or request
Description
Is your feature request related to a problem? Please describe.
Recently, RAPIDS/CCCL nightlies began failing due to a change in CCCL's memory resources. See NVIDIA/cccl#5313
Describe the solution you'd like
RMM should support CCCL's new memory resources, which are targeting CCCL 3.2. RAPIDS is currently using CCCL 3.1.
Current plan for adoption:
Allocation Interfaces
This list of tasks requires CCCL 3.1+, so we can ship these changes in 25.12.
- Build RMM w/ 3.1 via polyfill and
allocateupdates (Support building with CCCL 3.1.0 #2017) (25.10)
- Need to verify that all of RAPIDS builds with CCCL 3.1 with these changes in RMM, and ask Spark to do testing with the same pre-release of CCCL 3.1. The goal is to unblock adoption of CCCL 3.1 for RAPIDS.
- Then hopefully CCCL+RAPIDS CI should work.
- Update polyfill to new
allocatesignature (RMM internal refactoring) Use CCCL MR interface internally #2112 (25.12) - Refactor RAPIDS to use new
allocatesignature Migrate RAPIDS to CCCL MR interface (new allocation APIs) #2126 (25.12) - Deprecate old
allocatesignature Add deprecation warnings for legacy MR interface #2128 (25.12) - Remove deprecated legacy
allocateinterfaces Remove legacy memory resource interface in favor of CCCL interface #2150 (26.02)
Memory Resource Handling
This list of tasks requires CCCL 3.2+, so we will need to work on that migration in 26.02.
- Implement a bridge for pointer-based resources (26.02)
- Upgrade RAPIDS to use CCCL 3.2 (26.02)
- Require CCCL 3.2 in RMM: Remove LIBCUDACXX_ENABLE_EXPERIMENTAL_MEMORY_RESOURCE #2223 (26.04)
- Cleanup CCCL < 3.2: Remove compatibility code for CCCL earlier than 3.2 #2248 (26.04)
- Adopt
any_resourcein device-resource global mapping Store any_resource in device-resource global mapping #2200 (26.02) - Adopt
any_resourcein custom containers Store any_resource in device_buffer and device_uvector #2201 (26.02)- Containers storing RMM
resource_refs should instead storecuda::mr::any_resource
- Containers storing RMM
- Remove RMM
polyfillnamespace (26.04) - Convert adaptors into
cuda::shared_resource(staging, 26.06)- Refactor logging_resource_adaptor to shared CCCL MR design #2246
- Refactor pool_memory_resource to shared CCCL MR design #2258
- Refactor fixed_size_memory_resource and binning_memory_resource to shared CCCL MR design #2264
- Refactor tracking, statistics, and aligned resource adaptors to shared CCCL MR design #2265
- Return owning any_resource from set_per_device_resource_ref #2271
- Refactor arena_memory_resource to shared CCCL MR design #2272
- Refactor callback_memory_resource to shared CCCL MR design #2274
- Refactor prefetch_resource_adaptor to shared CCCL MR design #2275
- Refactor thread_safe_resource_adaptor to shared CCCL MR design #2276
- Refactor limiting_resource_adaptor to shared CCCL MR design #2277
- Refactor failure_callback_resource_adaptor to shared CCCL MR design #2278
- Remove
owning_wrapper(staging, 26.06) - Update base memory resources (staging, 26.06)
Remove device_memory_resource and legacy interface
- Migrate Python/Cython bindings from
device_memory_resource*toresource_ref/any_resource - Remove
device_memory_resourceinheritance from all C++ memory resources - Remove bridge infrastructure and
device_memory_resource
Post-tasks
- Delete
is_resource_adaptor.hppand its test usages (no longer meaningful aftershared_resourceadoption) - Use
cuda_mrorcuda_async_mrin tests rather thanget_current_device_resource_ref(see comment) - Check on equality docstrings after removing legacy code (see comment)
- Update documentation: remove references to
device_memory_resource,do_allocate, virtual dispatch from Doxygen and Python docstrings - Remove stale
#includedirectives for deleted headers (device_memory_resource.hpp,device_memory_resource_view.hpp,cccl_adaptors.hpp) -
Switch adaptors to be property-agnostic (e.g. support host accessible pools) and exposeUpstream& upstream_resource()- This isn't in scope for the core of this issue.
- Audit public symbols (see comment)
Update RAPIDS libraries
- Deprecate set_current_device_resource pointer-based API cudf#21018
- Store any_resource in device_uvector_policy raft#2917
- Update RMM memory resource APIs to ref-based equivalents raft#2920
- Adopt any_resource in BufferResource and RmmResourceAdaptor rapidsmpf#772
- Update RMM memory resource APIs to use ref-based equivalents cuml#7668
- Update RMM memory resource APIs to ref-based equivalents cugraph#5392
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or request
Type
Projects
Status
In Progress