Use the new reference-based RMM API to set RMM pools in device_resources_snmg#2972
Use the new reference-based RMM API to set RMM pools in device_resources_snmg#2972viclafargue wants to merge 2 commits intorapidsai:release/26.04from
device_resources_snmg#2972Conversation
device_resources_snmg
jinsolp
left a comment
There was a problem hiding this comment.
Thanks @viclafargue , LGTM
| rmm::mr::get_current_device_resource_ref(), | ||
| rmm::percent_of_free_device_memory(percent_of_free_memory))); | ||
| rmm::mr::set_per_device_resource_ref(rmm::cuda_device_id{device_id}, | ||
| *per_device_pools_.back()); |
There was a problem hiding this comment.
Looks like here is a little misunderstanding. "Refcounted" memory resource in this context unfortunately doesn't mean rmm::device_async_resource_ref shares the ownership of the rmm::mr::device_memory_resource. Here's the rmm::mr::set_per_device_resource_ref documention confirming device_resources_snmg still needs to outlive all its users. Therefore, in the current state, this PR doesn't resolve #2922 .
What happens with rmm-over-cccl overhaul is the introduction of the new concept/type for memory resources, which among other things ensures all resources are copyable (and can maintain shared refcounted state). So we will be able to store them by value inside allocated containers. For this to work, we'll need to pass the resources by value and not by reference for all allocations. But looking at the current rmm code, I don't see a way to pass an owning resource to the global device-resource map at the moment.
There was a problem hiding this comment.
On the other hand, the changeset is welcome in that it changes a part of raft to use a future-compatible reference-based rmm api instead of the pointer api that is being deprecated. This will help in future refactorings. Let's just adjust the PR title and description and I think it is good to go.
There was a problem hiding this comment.
You are right, thanks for spotting this. I will update the title and description of the PR.
device_resources_snmgdevice_resources_snmg
This PR updates the
device_resources_snmgutility so that it uses the new reference-based RMM API to set RMM pools.