Skip to content

Refactor do_build in build_prod_wheels.py for reviewability#3389

Open
subodh-dubey-amd wants to merge 2 commits intomainfrom
users/subodh-dubey-amd/refactor-do-build-wheels-core
Open

Refactor do_build in build_prod_wheels.py for reviewability#3389
subodh-dubey-amd wants to merge 2 commits intomainfrom
users/subodh-dubey-amd/refactor-do-build-wheels-core

Conversation

@subodh-dubey-amd
Copy link
Contributor

@subodh-dubey-amd subodh-dubey-amd commented Feb 12, 2026

Motivation

Preparatory refactor for sccache integration (PR #3171). Addresses reviewer feedback on build_prod_wheels.py being hard to review due to a single large do_build() function.

Technical Details

  • Extract core build steps (env setup, Triton, PyTorch, Audio, Vision, Apex, ccache stats) from do_build() into new _do_build_wheels_core() helper.
  • do_build() now handles only setup/orchestration and delegates to the helper.
  • Replace two redundant get_rocm_path("root") calls with the rocm_dir parameter.
  • Pure refactor — no new args, no sccache logic, no behavioral changes.

Test Result

No functional changes — refactored code follows the same execution path as before.

Submission Checklist

@subodh-dubey-amd subodh-dubey-amd changed the title **Refactor do_build in build_prod_wheels.py for reviewability** Refactor do_build in build_prod_wheels.py for reviewability Feb 12, 2026
@subodh-dubey-amd subodh-dubey-amd marked this pull request as ready for review February 12, 2026 12:17
Copy link
Member

@ScottTodd ScottTodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment on lines 536 to 537
"PYTORCH_ROCM_ARCH": pytorch_rocm_arch,
"USE_KINETO": os.environ.get("USE_KINETO", "ON" if not is_windows else "OFF"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I don't see either of these variables referenced in https://github.com/pytorch/vision/ or https://github.com/pytorch/audio/, but including them in the env used for all builds is probably harmless.

We could try moving some of these env vars out of "common" and into do_build_pytorch around here:

env["USE_ROCM"] = "ON"
env["USE_CUDA"] = "OFF"
env["USE_MPI"] = "OFF"
env["USE_NUMA"] = "OFF"
env["PYTORCH_BUILD_VERSION"] = pytorch_build_version
env["PYTORCH_BUILD_NUMBER"] = args.pytorch_build_number

(fine to keep this PR as a pure code move, without changing behavior, then attempt that later)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — both are pytorch-only but PYTORCH_ROCM_ARCH is read from env in do_build_pytorch() for flash attention checks, so moving it needs a param change. Keeping as-is for pure code move; tracked in #3425.

Comment on lines 531 to 535
env: dict[str, str] = {
"PYTHONUTF8": "1", # Some build files use utf8 characters, force IO encoding
"CMAKE_PREFIX_PATH": str(cmake_prefix),
"ROCM_HOME": str(rocm_dir),
"ROCM_PATH": str(rocm_dir),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line is pretty blurry here for what env setup goes in do_build() or _do_build_wheels_core(). Maybe we could add a dedicated setup_common_build_env() function that would contain all of this code?

env: dict[str, str] = {
"PYTHONUTF8": "1", # Some build files use utf8 characters, force IO encoding
"CMAKE_PREFIX_PATH": str(cmake_prefix),
"ROCM_HOME": str(rocm_dir),
"ROCM_PATH": str(rocm_dir),
"PYTORCH_ROCM_ARCH": pytorch_rocm_arch,
"USE_KINETO": os.environ.get("USE_KINETO", "ON" if not is_windows else "OFF"),
}
if args.use_ccache:
print("Building with ccache, clearing stats first")
env["CMAKE_C_COMPILER_LAUNCHER"] = "ccache"
env["CMAKE_CXX_COMPILER_LAUNCHER"] = "ccache"
run_command(["ccache", "--zero-stats"], cwd=tempfile.gettempdir())
# GLOO enabled for only Linux
if not is_windows:
env["USE_GLOO"] = "ON"
# At checkout, we compute some additional env vars that influence the way that
# the wheel is named/versioned.
if triton_dir:
triton_env_file = triton_dir / "build_env.json"
if triton_env_file.exists():
with open(triton_env_file, "r") as f:
addl_triton_env = json.load(f)
print(f"-- Additional triton build env vars: {addl_triton_env}")
env.update(addl_triton_env)
# With `CMAKE_PREFIX_PATH` set, `find_package(LLVM)` (called in
# `MLIRConfig.cmake` shipped as part of the LLVM bundled with
# trition) may pick up TheRock's LLVM instead of triton's.
# Here, `CMAKE_FIND_USE_CMAKE_ENVIRONMENT_PATH` is set
# and passed via `TRITON_APPEND_CMAKE_ARGS` to avoid this.
# See also https://github.com/ROCm/TheRock/issues/1999.
env["TRITON_APPEND_CMAKE_ARGS"] = (
"-DCMAKE_FIND_USE_CMAKE_ENVIRONMENT_PATH=FALSE"
)
if is_windows:
llvm_dir = rocm_dir / "lib" / "llvm" / "bin"
env.update(
{
"HIP_CLANG_PATH": str(llvm_dir.resolve().as_posix()),
"CC": str((llvm_dir / "clang-cl.exe").resolve()),
"CXX": str((llvm_dir / "clang-cl.exe").resolve()),
}
)
else:
env.update(
{
# Workaround GCC12 compiler flags.
"CXXFLAGS": " -Wno-error=maybe-uninitialized -Wno-error=uninitialized -Wno-error=restrict ",
"CPPFLAGS": " -Wno-error=maybe-uninitialized -Wno-error=uninitialized -Wno-error=restrict ",
}
)
# Workaround missing devicelib bitcode
# TODO: When "ROCM_PATH" and/or "ROCM_HOME" is set in the environment, the
# clang frontend ignores its default heuristics and (depending on version)
# finds the wrong path to the device library. This is bad/annoying. But
# the PyTorch build shouldn't even need these to be set. Unfortunately, it
# has been hardcoded for a long time. So we use a clang env var to force
# a specific device lib path to workaround the hack to get pytorch to build.
# This may or may not only affect the Python wheels with their own quirks
# on directory layout.
# Obviously, this should be completely burned with fire once the root causes
# are eliminted.
hip_device_lib_path = get_rocm_path("root") / "lib" / "llvm" / "amdgcn" / "bitcode"
if not hip_device_lib_path.exists():
print(
"WARNING: Default location of device libs not found. Relying on "
"clang heuristics which are known to be buggy in this configuration"
)
else:
env["HIP_DEVICE_LIB_PATH"] = str(hip_device_lib_path)
# OpenBLAS path setup
host_math_path = get_rocm_path("root") / "lib" / "host-math"
if not host_math_path.exists():
print(
"WARNING: Default location of host-math not found. "
"Will not build with OpenBLAS support."
)
else:
env["BLAS"] = "OpenBLAS"
env["OpenBLAS_HOME"] = str(host_math_path)
env["OpenBLAS_LIB_NAME"] = "rocm-openblas"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Extracted all env construction into a dedicated _setup_common_build_env() function.

@ScottTodd
Copy link
Member

Code changes seem fine but I want to wait until #3432 clears and we get a clean CI run before approving. See failures at https://github.com/ROCm/TheRock/actions/runs/22062019834/job/63760343692?pr=3389.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: TODO

Development

Successfully merging this pull request may close these issues.

2 participants