Transforms bridge between Python and C++#948
Merged
scotts merged 4 commits intometa-pytorch:mainfrom Oct 10, 2025
Merged
Conversation
scotts
commented
Oct 9, 2025
| std::optional<int64_t> stream_index = std::nullopt, | ||
| std::string_view device = "cpu", | ||
| std::string_view device_variant = "default", | ||
| std::string_view transform_specs = "", |
Contributor
Author
There was a problem hiding this comment.
Note that we're using an empty spec, "", as the default rather than making it optional. I find this makes the code simpler and easier to reason about.
scotts
commented
Oct 9, 2025
| videoStreamOptions.deviceVariant = device_variant; | ||
|
|
||
| std::vector<Transform*> transforms = | ||
| makeTransforms(std::string(transform_specs)); |
Contributor
Author
There was a problem hiding this comment.
An example of how using a default empty spec make things simpler than using an optional: we always call this function. If we have an empty spec, we just get back an empty vector.
NicolasHug
approved these changes
Oct 10, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Next step after #902. The design in #885 punted on how the Python layer would communicate the transforms and their parameters to the C++ layer. This PR answers that question: a string. The string format is:
In the above,
nameXis the name of a transform, andparamXare the parameters that transform accepts. For example, the only transform that we have now is resize, and its spec is currently:Where
resizeis literally what we expect, and<height>and<width>are integers that will become the height and width. In the future we will add a third parameter for algorithm. Future transforms will take potentially different number of parameters with different types; we'll define exactly what the spec for each transform is when we add it.I don't love that we're using strings with our own little specification language, but I'm convinced this is the least bad option:
0 -> resize, and then if we wanted to specify a resize operation of height 1024 and width 768, we could saytorch.tensor([0, 1024, 768]). But both the Python and C++ side would need to know this mapping of integer to transform. Yes, that's technically true with strings, but it's rather obvious what"resize"means. The machinery required for this approach is even more than what's required to accept our little string spec language.VideoDecoderclass will be responsible for translating fromtorchvision.transforms.v2to these specification strings. Since it's our own code that will generate these specs, we don't need to worry about making something with sharp edges that will cut users.