Skip to content

Use torch::stable::Device#1231

Merged
NicolasHug merged 9 commits intometa-pytorch:mainfrom
NicolasHug:stable_device
Feb 12, 2026
Merged

Use torch::stable::Device#1231
NicolasHug merged 9 commits intometa-pytorch:mainfrom
NicolasHug:stable_device

Conversation

@NicolasHug
Copy link
Contributor

@NicolasHug NicolasHug commented Feb 11, 2026

TSIA. Good luck.

@pytorch-bot
Copy link

pytorch-bot bot commented Feb 11, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/meta-pytorch/torchcodec/1231

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 81a3bc5 with merge base dcac4d0 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 11, 2026
torch::TensorOptions()
.dtype(torch::kUInt8)
.device(torch::Device(
static_cast<c10::DeviceType>(device.type()), device.index())));
Copy link
Contributor Author

@NicolasHug NicolasHug Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're casting our stableDevice into a non-stable torch::Device, because this is what torch::zeros expects. These kinds of quirks will go away once we migrate fully to the stable ABI: we won't be relying on torch::zeros anymore, we'll be relying on its stable counterpart, which will accept a stableDevice.

I'm not making this a TODO, because this will be a forced changed.

class DummyDeviceInterface : public DeviceInterface {
public:
DummyDeviceInterface(const torch::Device& device) : DeviceInterface(device) {}
DummyDeviceInterface(const StableDevice& device) : DeviceInterface(device) {}
Copy link
Contributor Author

@NicolasHug NicolasHug Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @dvrogozh , I think we'll move forward with this, which means third-party interfaces will need to use StableDevice. This is a subset of the changes from #1188, as I'll try to do the port in stages.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see that's now not a draft, so I assume it's ready for review and try out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes!

"-Wno-attributes"
)
endif()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image

std::function<DeviceInterface*(const StableDevice& device)>;

bool registerDeviceInterface(
TORCHCODEC_THIRD_PARTY_API bool registerDeviceInterface(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TORCHCODEC_THIRD_PARTY_API helper ensures the registerDeviceInterface symbol is properly public - which is needed because it is called by third-party libraries. Note that it was already public on main! Now with this PR, we need to force it to public, because it would otherwise get demoted to hidden. Why? Because some of its parameters CreateDeviceInterfaceFn transitively depends on a hidden type, namely StableDevice. Because StableDevice is hidden, the compiler will by default demote everything that also depends on it - that includes registerDeviceInterface. Hence why we have to force this back to public.

Also relevant: https://github.com/meta-pytorch/torchcodec/pull/1231/changes#r2794367843

@NicolasHug NicolasHug marked this pull request as ready for review February 11, 2026 17:04
// Create hardware device context
c10::cuda::CUDAGuard deviceGuard(device);
c10::cuda::CUDAGuard deviceGuard(
c10::Device(static_cast<c10::DeviceType>(device.type()), device.index()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a similar case to torch::zeros, where we must cast to a non-stable torch::Device until a stable counterpart for CUDAGuard is implemented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct!

} else {
frameOutput.data = cpuFrameOutput.data.to(device_);
frameOutput.data = cpuFrameOutput.data.to(torch::Device(
static_cast<c10::DeviceType>(device_.type()), device_.index()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw there is a torch::stable::Tensor defined. I assume we are saving that migration for a separate PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is probably going to be the next (and biggest) migration

using StableDeviceIndex = torch::stable::accelerator::DeviceIndex;

// DeviceGuard for CUDA context management
using StableDeviceGuard = torch::stable::accelerator::DeviceGuard;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't using this currently. Was this meant to be used in CudaDeviceInterface.cpp instead of casting back to torch::Device in deviceGuard?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks. I'll remove it for now and do a separate migration for all the "CUDA stuff" (streams sync, guards, etc.)

"Expected 3D RGB tensor (CHW format), got shape: ",
tensor.sizes());
STD_TORCH_CHECK(
tensor.device().type() == torch::kCUDA,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To double check my understanding, once we switch to torch::stable::Tensor, we would modify this comparison to kStableCuda?

// LICENSE file in the root directory of this source tree.

#include "DeviceInterface.h"
#include <cstring>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find specific instances where we were using functions from this header but I might've missed something. Maybe this is something that is related to the rest of porting to stable ABI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch thank you

Copy link
Contributor

@Dan-Flores Dan-Flores left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I left one clarifying question, but the changes LGTM!

# - the StableDevice field has "hidden" visibility because this is how
# torch::stable exports it.
# This creates this mismatch where a class has "higher" visibility than its
# member, hence the warning.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the issue that the compiler is trying to prevent here that: A user of VideoStreamOptions could try to call a method of StableDevice, but since StableDevice is "hidden", that method will not be available (due to the linker not finding it when building?)

And the reason we can ignore this warning is: StableDevice is header only, so all the functions are inline and will be available in each built .so file, so no linker issues will occur.

Is this your understanding? I read this wiki to try to understand the visibility attribute, but it is a little hard to apply to our situation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's going on is along the same vein as what you're describing, but it's not quite exactly that.

With the warning, the compiler is essentially telling us "Hey, that VideoStreamOptions class is supposed to have public visibility, but one of its field has hidden visibility. I'm going to have to make the class hidden too!!".

We can get away with this warning, because we're fine with VideoStreamOptions being hidden since no other (dynamically loaded) library is relying on VideoStreamOptions.

The key primer here, is that:

  • a public symbol can be dynamically loaded by another library
  • a private symbol cannot

(see also https://claude.ai/share/0f9ce2e8-0a98-47d1-a4dc-286e618a89e2)

Here, we don't have any library that dynamically loads the core library and that relies on VideoStreamOptions, so it can be private.

@NicolasHug NicolasHug merged commit 6f10da0 into meta-pytorch:main Feb 12, 2026
68 checks passed
@NicolasHug NicolasHug deleted the stable_device branch February 12, 2026 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants