Add BICUBIC interpolation modes to ResizeTransform#1167
Add BICUBIC interpolation modes to ResizeTransform#1167jeremyteboul wants to merge 1 commit intometa-pytorch:mainfrom
Conversation
NicolasHug
left a comment
There was a problem hiding this comment.
Thanks for the PR @jeremyteboul
Both interpolation modes are in scope to include within torchcodec. Our inclusion criteria for transforms is that we should be able to enforce strict equality check against the TorchVision implementation (#526). That is especially critical for Resize, since models are sensitive to changes in interpolation modes and discrepancies can lead to significant performance drops that are hard to diagnose.
Could you please add tests to assert strict equality checks between torchcodec and torchvision.v2.Resize on bicubic mode? We have such tests for bilinear mode here:
torchcodec/test/test_transform_ops.py
Lines 95 to 98 in e88bee9
For lanczos this will be more complex: there is currently no Lanczos interpolation implementation within pytorch. I actually think one can be in scope, but that's a massive undertaking (basically implementing Lanczos within torch.nn.functional.interpolate, and we'd have to make sure the implementation matches PIL's results). This is probably best left-out for the scope of this PR. I'm happy to learn more about your specific needs for Lanczos - and see how we can potentially mitigate them.
8887eed to
f445bef
Compare
test/test_transform_ops.py
Outdated
| ((1.5, 1.31), (0.5, 0.71), (0.7, 1.31), (1.5, 0.71), (1.0, 1.0), (2.0, 2.0)), | ||
| ) | ||
| @pytest.mark.parametrize("video", [NASA_VIDEO, TEST_SRC_2_720P]) | ||
| def test_resize_bicubic_torchvision( |
There was a problem hiding this comment.
@NicolasHug here is the test to verify the bicubic for both
test/test_transform_ops.py
Outdated
| ((1.5, 1.31), (0.5, 0.71), (0.7, 1.31), (1.5, 0.71), (1.0, 1.0), (2.0, 2.0)), | ||
| ) | ||
| @pytest.mark.parametrize("video", [NASA_VIDEO, TEST_SRC_2_720P]) | ||
| def test_resize_lanczos( |
There was a problem hiding this comment.
Related to your Lancsoz comment, this is a minimal version that do not validate always 100% the output but is sufficient to confirm the API calls produce good result; As it is more complicated and out of scope to have the full lancsoz verification , we can take it as a follow-up ?
To answer your question, all internal Video when ingested are resized with lancsoz as the training data we prepare to preserve the highest quality
NicolasHug
left a comment
There was a problem hiding this comment.
Thanks for the update!
There are some test failures:
FAILED test/test_transform_ops.py::TestCoreVideoDecoderTransformOps::test_resize_bicubic_torchvision[video0-1.5-1.31] - AssertionError: Expected at least 99.8% of values to be within atol=1, but only 99.43553161621094% were.
FAILED test/test_transform_ops.py::TestCoreVideoDecoderTransformOps::test_resize_bicubic_torchvision[video0-0.5-0.71] - AssertionError: Expected at least 99.8% of values to be within atol=1, but only 99.7022476196289% were.
FAILED test/test_transform_ops.py::TestCoreVideoDecoderTransformOps::test_resize_bicubic_torchvision[video0-0.7-1.31] - AssertionError: Expected at least 99.8% of values to be within atol=1, but only 99.59222412109375% were.
FAILED test/test_transform_ops.py::TestCoreVideoDecoderTransformOps::test_resize_bicubic_torchvision[video0-1.5-0.71] - AssertionError: Expected at least 99.8% of values to be within atol=1, but only 99.54126739501953% were.
FAILED test/test_transform_ops.py::TestCoreVideoDecoderTransformOps::test_resize_bicubic_torchvision[video0-2.0-2.0] - AssertionError: Expected at least 99.8% of values to be within atol=1, but only 99.28189849853516% were.
FAILED test/test_transform_ops.py::TestCoreVideoDecoderTransformOps::test_resize_bicubic_torchvision[video1-1.5-1.31] - AssertionError: Expected at least 99.8% of values to be within atol=1, but only 99.04127502441406% were.
FAILED test/test_transform_ops.py::TestCoreVideoDecoderTransformOps::test_resize_bicubic_torchvision[video1-0.5-0.71] - AssertionError: Expected at least 99.8% of values to be within atol=1, but only 99.68194580078125% were.
FAILED test/test_transform_ops.py::TestCoreVideoDecoderTransformOps::test_resize_bicubic_torchvision[video1-0.7-1.31] - AssertionError: Expected at least 99.8% of values to be within atol=1, but only 99.2843246459961% were.
FAILED test/test_transform_ops.py::TestCoreVideoDecoderTransformOps::test_resize_bicubic_torchvision[video1-1.5-0.71] - AssertionError: Expected at least 99.8% of values to be within atol=1, but only 99.37116241455078% were.
FAILED test/test_transform_ops.py::TestCoreVideoDecoderTransformOps::test_resize_bicubic_torchvision[video1-2.0-2.0] - AssertionError: Expected at least 99.8% of values to be within atol=1, but only 99.0071792602539% were.
FAILED test/test_transform_ops.py::TestCoreVideoDecoderTransformOps::test_resize_lanczos[video0-1.0-1.0] - AssertionError: Lanczos and bilinear should produce different results
I think lowering the tolerance to 99% for bicubic mode is probably reasonable. If tests pass with that, while keeping a max atol of 6 like before, this sounds good to me.
Once these tests pass, I'll make other review passes to tackle the code part (and potentially suggest refacs etc.)
On lanczos: I'll be happy to add it once we have a native torch implementation - this is important so we can guarantee consistency between TorchCodec and pytroch. Inconsistencies have been the source of countless silent bugs, and that's something we're being extra careful about. Since there is no Lanczos implementation in torch at the moment, I recommend leaving it out of this PR so we can make progress on the bicubic mode.
f445bef to
8294c6c
Compare
Adjusted the test with a threshold of 99% ok removed lancsoz fot now, please prioritize it, all our new models have been trained and pass eval with lancsoz |
9aabb88 to
179f4e7
Compare
179f4e7 to
8fdf462
Compare
|
@jeremyteboul has imported this pull request. If you are a Meta employee, you can view this in D91437613. |
|
@NicolasHug how are we doing to have this diff merged ? |
test/test_transform_ops.py
Outdated
| assert_tensor_close_on_at_least( | ||
| frame_resize, frame_tv, percentage=98, atol=1 | ||
| ) | ||
| # Bicubic has slightly different implementations between FFmpeg and TorchVision. | ||
| # 99% of pixels within 1, all pixels within 32. | ||
| torch.testing.assert_close(frame_resize, frame_tv, rtol=0, atol=32) |
There was a problem hiding this comment.
Thanks for the progress on fixing the test failures @jeremyteboul .
These tolerances are higher than what we'd ideally expect (see cell 6 on this gist). Above for bilinear, we have percentage=99 and also a max abs diff of 6.
This is probably fine, but to be fully sure I'll have to look into the FFmpeg implementation and compare it to the torch one. There might be some uint8 truncations happening in torch that don't happen in FFmpeg, or something similar.
I should be able to get to that next week (this week is BE week in the pytorch org).
When that's done and validated, we'll be able to get to the second review round, and I'll likely suggest some refacts - e.g. this test should probably be parametrized with the existing BILINEAR one.
There was a problem hiding this comment.
@NicolasHug Implementation differ slightly:
FFmpeg: Uses fixed-point arithmetic internally with rounding (for example: adds 0.5 before truncation)
PyTorch: Uses float32 with truncation towardzero when converting back to uint8
so example for a computed value of 127.4:
FFmpeg (rounding): 127.4 + 0.5 = 127.9 → 127
PyTorch (truncation): 127.4 → 127
but for 127.6:
FFmpeg: 127.6 + 0.5 = 128.1 → 128
PyTorch: 127.6 → 127 (difference of 1!)
and bicubic use 4x4 kernel so we have more errors so the 98% is because of this larger difference that compound
This enables FFmpeg's bicubic and lanczos scaling filters for higher-quality video resizing. The existing infrastructure already supports multiple modes; this change exposes them through the full E2E path. Changes: - Transform.h: Added BICUBIC and LANCZOS to InterpolationMode enum - Transform.cpp: Added cases for bicubic and lanczos in toFilterGraphInterpolation() and toSwsInterpolation() functions - custom_ops.cpp: Updated makeResizeTransform() to parse optional interpolation mode from transform spec string (e.g., 'resize, 256, 256, bicubic') - _decoder_transforms.py: Added 'interpolation' parameter to Resize class, updated _from_torchvision() to map TorchVision interpolation modes - test_transform_ops.py: Updated tests to allow BICUBIC/LANCZOS, added tests for new interpolation modes This allows users to use TorchCodec's Resize with bicubic interpolation: TorchCodecResize(size=(256, 256), interpolation='bicubic') Or use TorchVision's Resize with bicubic and it will be converted: v2.Resize(size=(256, 256), interpolation=v2.InterpolationMode.BICUBIC)
8fdf462 to
e42e041
Compare
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/meta-pytorch/torchcodec/1167
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This enables FFmpeg's bicubic and lanczos scaling filters for higher-quality video resizing. The existing infrastructure already supports multiple modes; this change exposes them through the full E2E path.
Changes:
This allows users to use TorchCodec's Resize with bicubic interpolation:
TorchCodecResize(size=(256, 256), interpolation='bicubic')
Or use TorchVision's Resize with bicubic and it will be converted:
v2.Resize(size=(256, 256), interpolation=v2.InterpolationMode.BICUBIC)