Conversation
f878ea1 to
5ab4742
Compare
d02804b to
7929a49
Compare
|
I was able to successfully build torchcodec (with torch + torchaudio 2.9) on Windows (gfx103X, 7.11.0a20251230 from nightlies/v2 just in case the CI failures when building pytorch wheels w/Python 3.12 and torch 2.9 were related to the newer rocm wheels) after cherry-picking 5ab4742 and patching torchcodec's setup.py to:
Patch here: patch-2.9-fix-windows-builds.patch. torch+audio+torchcodec from this build worked as expected, as long as the ffmpeg DLL directory was added via Hope to see this get added to CI soon, ideally with Windows builds, and even more ideally with gfx103X builds (Linux + Windows). |
|
@DevDotMathias Thanks for your input for torchcodec. Not sure would it be easier to add first the Linux support on first PR and then add the windows support on second PR little later. At least for me the pybind11 worked by installing it with pip to python venv that is used by the build on ci-image. This PR includes the docker changes I needed to make to our CI build image so that python versions under /opt/python directory contains the libpython libraries required by the torchcodec. For now I have temporarily pushed the modified CI image to docker io but before this patch is applied we need to push it back to github docker server and then re-change the download url to point to there. With these changes the torchcodec builded now ok for pytorch 2.9 and pytorch nighly. CI Test build with logs is here: https://github.com/ROCm/TheRock/actions/runs/20797194747 Pytorch tests where failing but those tests should not have anything torchcodec specific yet, so I am not sure why. |
|
Assigning @amd-shiraz for visibility. |
|
@stellaraccident I just rebased my patches to latest rock version and noticed that you had also added I did not investigate yet whether these new versions already contains python changes I need, so I will still include those changes in first push to get them saved somewhere... But if your ci image changes for python are enough for torchcodec build, I can drop ci-image related changes from my patch. I had a little different approach and added the build commands directly to dockerfile for different python version while you call external shell script. |
3aa7f92 to
4f33771
Compare
|
I was now able to drop the docker changes as new CI images from January 7 contained python versions with devel libraries. Patch handling is also removed in this version as support for applying patches for pytorch projects was also dropped on January 7. |
Whatever works best, I just noticed you had Windows builds listed in the description of this PR. If you have to get upstream to accept a PR (I'd submit, but I think it'll carry more weight coming from an AMD dev than from me) for switching to Ninja and reading env vars, I can see holding off until they merge.
I expect that to always be the case for Linux. If it works straight off for you in Windows CI images, all the better. |
|
I am now trying testing the images build by ci with gfx1100 on Ubuntu 24.04. I used the rocm-sdk from 20260108 for building and install is still little tricky as I need to pickup packages from v2 and v2-staging dirs.
which should printout:
|
7e9ba03 to
3077ce8
Compare
4f33771 to
4681062
Compare
- add support for checking out and building torchcodec as a part of the pytorch build process - enable the torchcodec source code checkout and build in the linux CI job Note: - Version of torchcodec we checkout and build depends from the pytorch version and torchcodecs/README.md lists which torchcodec and pytorch versions are compatible with each others. - Pinmapping file used by many other pytorch related subprojects does not yet exist for the torchcodec in directory pytorch/.github/ci_commit_pins and therefore we will check the pytorch version in pytorch_torchcodec_repo.py and then use that information to determine which torchcodec version to checkout. - At least for now we will only support the torchcodec with pytorch 2.9 and never releases and nightlies. - Different pytorch components may require a different versions of python setuptools. Torchcodec requires that setup tools version needs to be never than 80.0. Because of that the build scripts install install a torchcodec compatible version of setuptools before launching the torchcodec build itself. Todo: - Add smoketests - Add windows build support fixes: #1490 Signed-off-by: Mika Laitio <mika.laitio@amd.com>
4681062 to
0de0c8f
Compare
|
Rebased latest test build for gfx110x gpus passed ok for all python versions: |
| except ValueError: | ||
| print("Error, Skipping pytorch torchcodec build") | ||
| print( | ||
| f"Could not convert version string major and minor to int. Check the format of '{version_string}'." |
There was a problem hiding this comment.
is this a typo ? should this be version_string or build_version ?
There was a problem hiding this comment.
I'm assuming it should be build_version. "version_string" is not defined, only "version_string_arr".
There was a problem hiding this comment.
Fixed it was typo, needs to be build_version.
|
|
||
| # verify that installed setuptools > 80.0 (80.9.0 tested) | ||
| exec( | ||
| [sys.executable, "-m", "pip", "install", "setuptools>80.0"], |
There was a problem hiding this comment.
Prefer installing deps once in the workflow (pinned versions / constraints file), not ad-hoc inside the build script. unless there is a string reason to. if there is pls add that here as comment.
There was a problem hiding this comment.
I agree. I would replace this with a version check of setuptools and return an error if setputools is not a sufficient version.
Similarly for the other deps, return an error if they aren't already installed.
There was a problem hiding this comment.
I noticed torchcodec build errors if the setuptools version was 0.7.x while pytorch, audio and visio had build fine with that version. This is the reason why I have added the pip install commands for setuptools and pybind.
Triton and pytorch have requirements.txt and when we build those components we install their dependencies in build_prod_wheels.py with following commands:
python -m pip install -r triton/python/requirements.txt (line 583)
and
python -m pip install -r pytorch/requirements.txt (line 827)
Problem is that the torchcodec does not have requirements.txt file. Because we have dropped support for patching the pytorch related projects, I can not use the patch file to create a requirements.txt for torchcodec. This is the reason I have installed packages I have noticed to be need one by one.
There was a problem hiding this comment.
I did local builds for pytorch 2.9, 2.10 and nightly for python 3.12
and Iwas not able to reproduce anymore the error related to setuptools version.
Depending from the pytorch version build, setuptools version ended up being in my build in the end either 80.9.0 or 80.10.2.
I launched one more test build with CI for 9070/gfx120x target.
There was a problem hiding this comment.
CI build failed after I removed the dependency install at least for the pybinds cmake files missing.
It worked on my local Ubuntu build because I had there the pybind-dev installed.
Error:
2026-01-26T23:17:04.1824662Z CMake Error at src/torchcodec/_core/CMakeLists.txt:7 (find_package):
2026-01-26T23:17:04.1825183Z -- Configuring incomplete, errors occurred!
2026-01-26T23:17:04.1826085Z By not providing "Findpybind11.cmake" in CMAKE_MODULE_PATH this project has
2026-01-26T23:17:04.1826924Z asked CMake to find a package configuration file provided by "pybind11",
2026-01-26T23:17:04.1827417Z but CMake did not find one.
2026-01-26T23:17:04.1827621Z
2026-01-26T23:17:04.1827875Z Could not find a package configuration file provided by "pybind11" with any
2026-01-26T23:17:04.1828350Z of the following names:
2026-01-26T23:17:04.1828521Z
2026-01-26T23:17:04.1828625Z pybind11Config.cmake
2026-01-26T23:17:04.1828906Z pybind11-config.cmake
2026-01-26T23:17:04.1829075Z
2026-01-26T23:17:04.1829309Z Add the installation prefix of "pybind11" to CMAKE_PREFIX_PATH or set
2026-01-26T23:17:04.1829893Z "pybind11_DIR" to a directory containing one of the above files. If
2026-01-26T23:17:04.1830584Z "pybind11" provides a separate development package or SDK, be sure it has
2026-01-26T23:17:04.1830928Z been installed.
Either we need to install the pybind to our manylinux install natively or we need to install it via pip install dynamically. For now I test the pip install way as it is most similar to "pip install -r requirements.txt" that we do for torch and triton.
There was a problem hiding this comment.
We discussed this in the morning meeting because pybind-devel headers needs to still get installed even with the latest patch version somehow to build image.
- We could pre-install pybind-devel rpm to manylinux-docker image from rpm package. Marius did not like from that idea.
- We could install pybind-devel rpm during the build time. This idea was not good. One reason is that in our docker image we have multiple python versions under /opt/python directory and install of pybind-devel rpm would install one more as a dependency under /usr.
- Third option is to install with pip during the build time which is now implemented and I got impression that it is preferred. I would not like to tie the exact pybind version to pip-install command because we are doing a builds wit multiple different python versions and I am not sure can the pybind-version be different depending from the python version currently used. (3.10, 3.11, 3.12, 3.13 or 3.14)
| "Python3_ROOT_DIR": python_home_dir, | ||
| } | ||
| ) | ||
| print("do_build_pytorch_torchcodec") |
There was a problem hiding this comment.
do we need this twice ? also a dup above in line 962
There was a problem hiding this comment.
Fixed, removed the dub printout line. Leaving other the printouts for env-variables.
jeffdaily
left a comment
There was a problem hiding this comment.
Same changes requested as @amd-shiraz .
| except ValueError: | ||
| print("Error, Skipping pytorch torchcodec build") | ||
| print( | ||
| f"Could not convert version string major and minor to int. Check the format of '{version_string}'." |
There was a problem hiding this comment.
I'm assuming it should be build_version. "version_string" is not defined, only "version_string_arr".
|
|
||
| # verify that installed setuptools > 80.0 (80.9.0 tested) | ||
| exec( | ||
| [sys.executable, "-m", "pip", "install", "setuptools>80.0"], |
There was a problem hiding this comment.
I agree. I would replace this with a version check of setuptools and return an error if setputools is not a sufficient version.
Similarly for the other deps, return an error if they aren't already installed.
Signed-off-by: Mika Laitio <mika.laitio@amd.com>
- update from v0.10.0-rc1 to v0.10.0 Signed-off-by: Mika Laitio <mika.laitio@amd.com>
I did local builds for pytorch 2.9, 2.10 and nightly and was not able to reproduce anymore the error that I have seen if not making sure before the torchcodec build starts that the right pybind and setuptools versions were installed. It may have been a problem with some older torchcodec version from the time I tested the building also for pytorch 2.7 and pytorch 2.8 Signed-off-by: Mika Laitio <mika.laitio@amd.com>
Fixes following build error on ci-machine:
By not providing "Findpybind11.cmake" in CMAKE_MODULE_PATH this project has
asked CMake to find a package configuration file provided by "pybind11",
but CMake did not find one.
Could not find a package configuration file provided by "pybind11" with any
of the following names:
pybind11Config.cmake
pybind11-config.cmake
Signed-off-by: Mika Laitio <mika.laitio@amd.com>
|
Now with pybind-dev dependency inserted back for torchcodec the build passes. https://github.com/ROCm/TheRock/actions/runs/21378178853/job/61543032071 And short test for this on linux machine with gfx1201/RX 9070 with python 3.12 This should printout:
|
|
As a sidenote there is now PR to remove the direct linking of libpython to custom_ops-libraries in torchcodec. Upstream PR: meta-pytorch/torchcodec#1224 |
ScottTodd
left a comment
There was a problem hiding this comment.
The overall structure of the script and workflow changes looks fine. I'm wondering at a high level - is this working? I don't see any evidence of what testing has been done in the PR description and the code seems to be building with CUDA (and also ROCM?) disabled.
There was a problem hiding this comment.
As was just done for adding https://github.com/rocm/apex, please sequence this as at least two PRs:
- Add
build_prod_wheels.pychanges together with newpytorch_torchcodec_repo.pychanges - Add code to
.github/workflows/build_portable_linux_pytorch_wheels.yml
See the commit sequence for Apex:
That way, we can get the build code in place for other developers to start using and then enable in our release builds separately. If we then need to revert for some reason we can revert just the workflow changes as a flag flip while keeping local build support for easier debugging.
| torchaudio_version: ${{ steps.build-pytorch-wheels.outputs.torchaudio_version }} | ||
| torchvision_version: ${{ steps.build-pytorch-wheels.outputs.torchvision_version }} | ||
| triton_version: ${{ steps.build-pytorch-wheels.outputs.triton_version }} | ||
| torchcodec_version: ${{ steps.build-pytorch-wheels.outputs.torchcodec_version }} |
There was a problem hiding this comment.
This output var won't exist without changes to write_torch_versions.py. See 39d840b for example
| else: | ||
| print("--- Not build pytorch-vision (no --pytorch-vision-dir)") | ||
| # Build pytorch torchcodec | ||
| if args.build_pytorch_torchcodec or ( |
There was a problem hiding this comment.
nit: add a line break between the "Build pytorch vision" and the "Build pytorch torchcodec" sections
| python_home_dir = get_python_home_dir() | ||
| # python env executed in ci build is intepreter version only version in /opt/python/cp3xy-cp3xy dir | ||
| # environment which contains also the libpython development files is under dir /opt/python-shared/cp3xy-cp3xy dir | ||
| # and we need to force the torchcodec to find it with find_package("python" command | ||
| if "/opt/python/cp" in python_home_dir: | ||
| python_home_dir = python_home_dir.replace( | ||
| "/opt/python/cp", "/opt/python-shared/cp", 1 | ||
| ) |
There was a problem hiding this comment.
This doesn't seem right. Don't your changes in meta-pytorch/torchcodec#1224 remove the need for this?
We can fork torchcodec under https://github.com/ROCm/ and point the torchcodec checkout there as needed for this bringup work, then switch back to using upstream once that PR (and any other ROCm/TheRock-specific fixes are landed). How does that approach sound, @jeffdaily ?
(and if we do need this python_home_dir code there are more portable ways to do this that work inside and outside of virtual environments on Windows and Linux... the path walking and /opt/python code is brittle)
There was a problem hiding this comment.
We will still need this even if the patch is applied in upstream that will remove the linking to libpython because of the find package call is used also to find the location of python header files. Basically
find_package(Python3 ${PYTHON_VERSION} EXACT COMPONENTS Development)
when succeeds will setup environment variable
${Python3_INCLUDE_DIRS}
that is then used on this cmake method call:
target_include_directories(${library_name}
PRIVATE
./../../../
"${TORCH_INSTALL_PREFIX}/include"
${Python3_INCLUDE_DIRS}
)
| env.update( | ||
| { | ||
| "ENABLE_CUDA": "0", |
There was a problem hiding this comment.
Why ENABLE_CUDA=0? Is ROCM enabled or does this produce a CPU-only torchcodec wheel? (If so we could just copy upstream-produced torchcodec CPU wheels into our release index instead of building from source ourselves)
There was a problem hiding this comment.
At least for now it's CPU only.
There was a problem hiding this comment.
That answers part of the question, but not what is critical here. Are CPU-only torchcodec packages useful at all for us? If so, why wouldn't we just use upstream-produced packages instead of building them ourselves? Is this a stepping stone to building with ROCm support? What still needs to be done for that?
There was a problem hiding this comment.
We discussed offline. CPU-only torchcodec packages are useful and a reasonable starting point. We may just be able to use packages that are already built upstream though, by adding an entry to https://github.com/ROCm/TheRock/blob/main/build_tools/third_party/s3_management/update_dependencies.py.
If we aren't setting any custom build options (like ROCm support using the ROCm Python packages), I would prefer that over including all this extra code and spending more compute time on our build runners.
There was a problem hiding this comment.
We discussed offline. CPU-only torchcodec packages are useful and a reasonable starting point. We may just be able to use packages that are already built upstream though, by adding an entry to https://github.com/ROCm/TheRock/blob/main/build_tools/third_party/s3_management/update_dependencies.py.
If we aren't setting any custom build options (like ROCm support using the ROCm Python packages), I would prefer that over including all this extra code and spending more compute time on our build runners.
If you guys go this route, please submit a PR to upstream[1] so that their builds with be compatible with yours across all platforms. It makes no sense to me that TorchCodec would be the only one to force MSVC for Windows builds when both PyTorch and TorchAudio use ninja as a generator and read the CC/CXX env vars, and until the bug in clang is fixed (which could be anywhere from next week to never) that will never play nice.
[1] As I've said before, I'm happy to do this, but my PR would carry zero weight, where one from AMD would be treated with significance.
| # cmake support for pybind11 to avoid error: | ||
| # Could not find a package configuration file provided by "pybind11" | ||
| exec( | ||
| [sys.executable, "-m", "pip", "install", "pybind11[global]"], | ||
| cwd=pytorch_torchcodec_dir, | ||
| env=env, | ||
| ) | ||
| # torchcodec build | ||
| exec( | ||
| [sys.executable, "-m", "build", "--wheel", "-vvv", "--no-isolation"], | ||
| cwd=pytorch_torchcodec_dir, | ||
| env=env, | ||
| ) |
There was a problem hiding this comment.
Shouldn't build requirements be automatically installed by python -m build --wheel? How is this handled for the other projects?
There was a problem hiding this comment.
It seems that setup.py does not handle this in torchcodec as we would get pybind error.
In triton and pytorch we handle this by doing "pip install -r requirements.txt" but torchcodec does not have this file, we call "pip install pybind" instead directly because otherwise we will get following build error on our CI:
By not providing "Findpybind11.cmake" in CMAKE_MODULE_PATH this project has
asked CMake to find a package configuration file provided by "pybind11",
but CMake did not find one.
Could not find a package configuration file provided by "pybind11" with any
of the following names:
pybind11Config.cmake
pybind11-config.cmake
In theory we could also install dependency to CI machine with "rpm" but that version would then install python 3.11 as a dependency to our CI image that we do not want as we have our own build python versions on the image. We previously discussed about this and conclusion was that install via pip is better alternative.
| THIS_MAIN_REPO_NAME = "pytorch_torchcodec" | ||
| THIS_DIR = Path(__file__).resolve().parent | ||
|
|
||
| DEFAULT_ORIGIN = "https://github.com/pytorch/torchcodec.git" |
There was a problem hiding this comment.
This moved to https://github.com/meta-pytorch/torchcodec (redirects should still work, but we might as well update)
| DEFAULT_ORIGIN = "https://github.com/pytorch/torchcodec.git" | |
| DEFAULT_ORIGIN = "https://github.com/meta-pytorch/torchcodec.git" |
(longer term I'm not sure about the politics with the PyTorch foundation vs Meta for such projects... I wonder what sorts of lines we should draw for packages that we build and distribute as AMD)
There was a problem hiding this comment.
I updated the URL and pushed the change in. I can't comment too much whether we should or should not include this to our distro.
New upstream url: https://github.com/meta-pytorch/torchcodec.git Signed-off-by: Mika Laitio <mika.laitio@amd.com>
Change audio related help texts to point to torchcodec. Co-authored-by: Scott Todd <scott.todd0@gmail.com>

Add support for checking out and building torchcodec during the pytorch build phase
Todo:
fixes: #1490