Use torch::stable::Device#1231
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/meta-pytorch/torchcodec/1231
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 81a3bc5 with merge base dcac4d0 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| torch::TensorOptions() | ||
| .dtype(torch::kUInt8) | ||
| .device(torch::Device( | ||
| static_cast<c10::DeviceType>(device.type()), device.index()))); |
There was a problem hiding this comment.
We're casting our stableDevice into a non-stable torch::Device, because this is what torch::zeros expects. These kinds of quirks will go away once we migrate fully to the stable ABI: we won't be relying on torch::zeros anymore, we'll be relying on its stable counterpart, which will accept a stableDevice.
I'm not making this a TODO, because this will be a forced changed.
| class DummyDeviceInterface : public DeviceInterface { | ||
| public: | ||
| DummyDeviceInterface(const torch::Device& device) : DeviceInterface(device) {} | ||
| DummyDeviceInterface(const StableDevice& device) : DeviceInterface(device) {} |
There was a problem hiding this comment.
Ok, I see that's now not a draft, so I assume it's ready for review and try out?
| "-Wno-attributes" | ||
| ) | ||
| endif() | ||
|
|
| std::function<DeviceInterface*(const StableDevice& device)>; | ||
|
|
||
| bool registerDeviceInterface( | ||
| TORCHCODEC_THIRD_PARTY_API bool registerDeviceInterface( |
There was a problem hiding this comment.
The TORCHCODEC_THIRD_PARTY_API helper ensures the registerDeviceInterface symbol is properly public - which is needed because it is called by third-party libraries. Note that it was already public on main! Now with this PR, we need to force it to public, because it would otherwise get demoted to hidden. Why? Because some of its parameters CreateDeviceInterfaceFn transitively depends on a hidden type, namely StableDevice. Because StableDevice is hidden, the compiler will by default demote everything that also depends on it - that includes registerDeviceInterface. Hence why we have to force this back to public.
Also relevant: https://github.com/meta-pytorch/torchcodec/pull/1231/changes#r2794367843
| // Create hardware device context | ||
| c10::cuda::CUDAGuard deviceGuard(device); | ||
| c10::cuda::CUDAGuard deviceGuard( | ||
| c10::Device(static_cast<c10::DeviceType>(device.type()), device.index())); |
There was a problem hiding this comment.
Is this a similar case to torch::zeros, where we must cast to a non-stable torch::Device until a stable counterpart for CUDAGuard is implemented?
| } else { | ||
| frameOutput.data = cpuFrameOutput.data.to(device_); | ||
| frameOutput.data = cpuFrameOutput.data.to(torch::Device( | ||
| static_cast<c10::DeviceType>(device_.type()), device_.index())); |
There was a problem hiding this comment.
I saw there is a torch::stable::Tensor defined. I assume we are saving that migration for a separate PR?
There was a problem hiding this comment.
Yes this is probably going to be the next (and biggest) migration
| using StableDeviceIndex = torch::stable::accelerator::DeviceIndex; | ||
|
|
||
| // DeviceGuard for CUDA context management | ||
| using StableDeviceGuard = torch::stable::accelerator::DeviceGuard; |
There was a problem hiding this comment.
We aren't using this currently. Was this meant to be used in CudaDeviceInterface.cpp instead of casting back to torch::Device in deviceGuard?
There was a problem hiding this comment.
Good catch, thanks. I'll remove it for now and do a separate migration for all the "CUDA stuff" (streams sync, guards, etc.)
| "Expected 3D RGB tensor (CHW format), got shape: ", | ||
| tensor.sizes()); | ||
| STD_TORCH_CHECK( | ||
| tensor.device().type() == torch::kCUDA, |
There was a problem hiding this comment.
To double check my understanding, once we switch to torch::stable::Tensor, we would modify this comparison to kStableCuda?
| // LICENSE file in the root directory of this source tree. | ||
|
|
||
| #include "DeviceInterface.h" | ||
| #include <cstring> |
There was a problem hiding this comment.
I couldn't find specific instances where we were using functions from this header but I might've missed something. Maybe this is something that is related to the rest of porting to stable ABI?
There was a problem hiding this comment.
Good catch thank you
Dan-Flores
left a comment
There was a problem hiding this comment.
Thanks for working on this! I left one clarifying question, but the changes LGTM!
| # - the StableDevice field has "hidden" visibility because this is how | ||
| # torch::stable exports it. | ||
| # This creates this mismatch where a class has "higher" visibility than its | ||
| # member, hence the warning. |
There was a problem hiding this comment.
Is the issue that the compiler is trying to prevent here that: A user of VideoStreamOptions could try to call a method of StableDevice, but since StableDevice is "hidden", that method will not be available (due to the linker not finding it when building?)
And the reason we can ignore this warning is: StableDevice is header only, so all the functions are inline and will be available in each built .so file, so no linker issues will occur.
Is this your understanding? I read this wiki to try to understand the visibility attribute, but it is a little hard to apply to our situation.
There was a problem hiding this comment.
What's going on is along the same vein as what you're describing, but it's not quite exactly that.
With the warning, the compiler is essentially telling us "Hey, that VideoStreamOptions class is supposed to have public visibility, but one of its field has hidden visibility. I'm going to have to make the class hidden too!!".
We can get away with this warning, because we're fine with VideoStreamOptions being hidden since no other (dynamically loaded) library is relying on VideoStreamOptions.
The key primer here, is that:
- a public symbol can be dynamically loaded by another library
- a private symbol cannot
(see also https://claude.ai/share/0f9ce2e8-0a98-47d1-a4dc-286e618a89e2)
Here, we don't have any library that dynamically loads the core library and that relies on VideoStreamOptions, so it can be private.

TSIA. Good luck.