Work around intermittent Windows build failure (missing pyconfig.h, etc.)#48188
Conversation
Files inventory check summaryFile checks results against ancestor 18a1df73: Results for datadog-agent_7.79.0~devel.git.48.a989d31.pipeline.103943730-1_amd64.deb:Detected file changes:
|
### What does this PR do? Add `common:windows --enable_runfiles` to `.bazelrc`. ### Motivation rules_python 1.9.0 (introduced in #48082) transitions every `py_binary` on Windows to `enable_runfiles=true`. With Bazel's default of `enable_runfiles=false` on Windows, this creates a second Bazel configuration, causing `python_win` to be built twice concurrently. `build_python.bat` writes MSBuild intermediate files (`PCbuild/obj/`, `PCbuild/amd64/`, `msbuild.rsp`) into the shared execroot source tree rather than into the action's output directory, so the two concurrent builds race on those files, manifesting as intermittent `pyconfig.h: No such file or directory` errors that disappear when the remote cache is warm. Setting `enable_runfiles=true` globally makes the transition a no-op (same flag value → same configuration hash → one build of `python_win`), eliminating the race. ### Describe how you validated your changes Analysis of a failing CI job log (`PCbuild/obj/*.pdb` locked by another process during CleanAll). ### Additional Notes This is a short-term workaround. The proper fix is to make `build_python.bat` hermetic by redirecting MSBuild's output and intermediate directories to `$(@d)` instead of the execroot source tree.
7e718b0 to
a989d31
Compare
Static quality checks✅ Please find below the results from static quality gates Successful checksInfo
24 successful checks with minimal change (< 2 KiB)
On-wire sizes (compressed)
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a989d3184c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
alopezz
left a comment
There was a problem hiding this comment.
Very valuable find, thanks.
Regression DetectorRegression Detector ResultsMetrics dashboard Baseline: c08404f Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | docker_containers_cpu | % cpu utilization | -0.04 | [-3.04, +2.97] | 1 | Logs |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | quality_gate_logs | % cpu utilization | +0.53 | [-1.05, +2.12] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_idle | memory utilization | +0.38 | [+0.33, +0.42] | 1 | Logs bounds checks dashboard |
| ➖ | ddot_metrics_sum_delta | memory utilization | +0.19 | [+0.02, +0.36] | 1 | Logs |
| ➖ | quality_gate_metrics_logs | memory utilization | +0.19 | [-0.05, +0.43] | 1 | Logs bounds checks dashboard |
| ➖ | ddot_metrics_sum_cumulativetodelta_exporter | memory utilization | +0.15 | [-0.07, +0.38] | 1 | Logs |
| ➖ | quality_gate_idle_all_features | memory utilization | +0.09 | [+0.06, +0.13] | 1 | Logs bounds checks dashboard |
| ➖ | uds_dogstatsd_20mb_12k_contexts_20_senders | memory utilization | +0.03 | [-0.03, +0.09] | 1 | Logs |
| ➖ | file_to_blackhole_0ms_latency | egress throughput | +0.02 | [-0.45, +0.49] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api | ingress throughput | +0.00 | [-0.19, +0.19] | 1 | Logs |
| ➖ | otlp_ingest_logs | memory utilization | -0.00 | [-0.11, +0.10] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api_v3 | ingress throughput | -0.01 | [-0.20, +0.19] | 1 | Logs |
| ➖ | tcp_dd_logs_filter_exclude | ingress throughput | -0.01 | [-0.13, +0.11] | 1 | Logs |
| ➖ | otlp_ingest_metrics | memory utilization | -0.01 | [-0.17, +0.15] | 1 | Logs |
| ➖ | file_tree | memory utilization | -0.01 | [-0.07, +0.05] | 1 | Logs |
| ➖ | file_to_blackhole_1000ms_latency | egress throughput | -0.01 | [-0.43, +0.41] | 1 | Logs |
| ➖ | ddot_metrics | memory utilization | -0.03 | [-0.21, +0.16] | 1 | Logs |
| ➖ | file_to_blackhole_100ms_latency | egress throughput | -0.03 | [-0.11, +0.06] | 1 | Logs |
| ➖ | docker_containers_cpu | % cpu utilization | -0.04 | [-3.04, +2.97] | 1 | Logs |
| ➖ | docker_containers_memory | memory utilization | -0.12 | [-0.20, -0.03] | 1 | Logs |
| ➖ | file_to_blackhole_500ms_latency | egress throughput | -0.16 | [-0.55, +0.23] | 1 | Logs |
| ➖ | ddot_logs | memory utilization | -0.40 | [-0.45, -0.34] | 1 | Logs |
| ➖ | ddot_metrics_sum_cumulative | memory utilization | -0.52 | [-0.67, -0.38] | 1 | Logs |
| ➖ | tcp_syslog_to_blackhole | ingress throughput | -0.68 | [-0.80, -0.57] | 1 | Logs |
Bounds Checks: ❌ Failed
| perf | experiment | bounds_check_name | replicates_passed | observed_value | links |
|---|---|---|---|---|---|
| ✅ | docker_containers_cpu | simple_check_run | 10/10 | 710 ≥ 26 | |
| ✅ | docker_containers_memory | memory_usage | 10/10 | 272.99MiB ≤ 370MiB | |
| ✅ | docker_containers_memory | simple_check_run | 10/10 | 575 ≥ 26 | |
| ✅ | file_to_blackhole_0ms_latency | memory_usage | 10/10 | 0.19GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_0ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_1000ms_latency | memory_usage | 10/10 | 0.23GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_1000ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_100ms_latency | memory_usage | 10/10 | 0.20GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_100ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_500ms_latency | memory_usage | 10/10 | 0.21GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_500ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | quality_gate_idle | intake_connections | 10/10 | 3 = 3 | bounds checks dashboard |
| ❌ | quality_gate_idle | memory_usage | 9/10 | 175.46MiB > 175MiB | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | intake_connections | 10/10 | 2 ≤ 3 | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | memory_usage | 10/10 | 499.97MiB ≤ 550MiB | bounds checks dashboard |
| ✅ | quality_gate_logs | intake_connections | 10/10 | 4 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_logs | memory_usage | 10/10 | 211.23MiB ≤ 220MiB | bounds checks dashboard |
| ✅ | quality_gate_logs | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | cpu_usage | 10/10 | 340.84 ≤ 2000 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | intake_connections | 10/10 | 3 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | memory_usage | 10/10 | 399.86MiB ≤ 475MiB | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
CI Pass/Fail Decision
❌ Failed. Some Quality Gates were violated.
- quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check memory_usage: 9/10 replicas passed. Failed 1 which is > 0. Gate FAILED.
- quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
### What does this PR do? - revert `common:windows --enable_runfiles` added in #48188, which broke rtloader tests (`libdatadog-agent-three.dll`, error code 5) - fix `build_python.bat` to be hermetic: redirect MSBuild intermediate files (`obj/`, `amd64/`) and `msbuild.rsp` out of the shared execroot source tree and into a per-configuration temp directory derived from `$(@d)` ### Motivation `build_python.bat` was writing MSBuild artifacts to `%sourcedir%\PCbuild\` (the execroot source tree). When rules_python 1.9.0 introduced a configuration transition on `py_binary` for Windows, `python_win` started being built in two Bazel configurations concurrently. Both invocations raced on the shared `PCbuild/` directory, corrupting the generated `pyconfig.h` (in `PCbuild/obj/313amd64_*/pythoncore/`) and causing intermittent failures depending on whether the remote cache was warm. `#48188` papered over this with `--enable_runfiles`, which made the transition a no-op but broke `LoadLibraryA("libdatadog-agent-three.dll")` in all rtloader tests (error code 5, access denied, when loading DLLs from Bazel's runfiles temp directory). The proper fix is to redirect `Py_OutDir` and `Py_IntDir` to `$(@d)/tmp` (unique per configuration) so each build gets its own isolated intermediate directory and no longer touches the source tree. ### Describe how you validated your changes Analysis of CI job logs: - #48188's failure: `PCbuild/obj/*.pdb` locked by another process during CleanAll, then `pyconfig.h` not found in the generated path - post-#48188 regression: `libdatadog-agent-three.dll` error code 5 in all rtloader tests ### Additional Notes `python.bat` (written by CPython's `build.bat` to `%sourcedir%`) is still a minor impurity in the source tree but is a single small file deleted on both sides; it is not the cause of the race.
### What does this PR do? - revert `common:windows --enable_runfiles` added in #48188, which broke rtloader tests (`libdatadog-agent-three.dll`, error code 5) - fix `build_python.bat` to be hermetic: redirect MSBuild intermediate files (`obj/`, `amd64/`) and `msbuild.rsp` out of the shared execroot source tree and into a per-configuration temp directory derived from `$(@d)` ### Motivation `build_python.bat` was writing MSBuild artifacts to `%sourcedir%\PCbuild\` (the execroot source tree). When rules_python 1.9.0 introduced a configuration transition on `py_binary` for Windows, `python_win` started being built in two Bazel configurations concurrently. Both invocations raced on the shared `PCbuild/` directory, corrupting the generated `pyconfig.h` (in `PCbuild/obj/313amd64_*/pythoncore/`) and causing intermittent failures depending on whether the remote cache was warm. `#48188` papered over this with `--enable_runfiles`, which made the transition a no-op but broke `LoadLibraryA("libdatadog-agent-three.dll")` in all rtloader tests (error code 5, access denied, when loading DLLs from Bazel's runfiles temp directory). The proper fix is to redirect `Py_OutDir` and `Py_IntDir` to `$(@d)/tmp` (unique per configuration) so each build gets its own isolated intermediate directory and no longer touches the source tree. ### Describe how you validated your changes Analysis of CI job logs: - #48188's failure: `PCbuild/obj/*.pdb` locked by another process during CleanAll, then `pyconfig.h` not found in the generated path - post-#48188 regression: `libdatadog-agent-three.dll` error code 5 in all rtloader tests ### Additional Notes `python.bat` (written by CPython's `build.bat` to `%sourcedir%`) is still a minor impurity in the source tree but is a single small file deleted on both sides; it is not the cause of the race.
### What does this PR do? - revert `common:windows --enable_runfiles` added in #48188, which broke rtloader tests (`libdatadog-agent-three.dll`, error code 5) - fix `build_python.bat` to be hermetic: redirect MSBuild intermediate files (`obj/`, `amd64/`) and `msbuild.rsp` out of the shared execroot source tree and into a per-configuration temp directory derived from `$(@d)` ### Motivation `build_python.bat` was writing MSBuild artifacts to `%sourcedir%\PCbuild\` (the execroot source tree). When rules_python 1.9.0 introduced a configuration transition on `py_binary` for Windows, `python_win` started being built in two Bazel configurations concurrently. Both invocations raced on the shared `PCbuild/` directory, corrupting the generated `pyconfig.h` (in `PCbuild/obj/313amd64_*/pythoncore/`) and causing intermittent failures depending on whether the remote cache was warm. `#48188` papered over this with `--enable_runfiles`, which made the transition a no-op but broke `LoadLibraryA("libdatadog-agent-three.dll")` in all rtloader tests (error code 5, access denied, when loading DLLs from Bazel's runfiles temp directory). The proper fix is to redirect `Py_OutDir` and `Py_IntDir` to `$(@d)/tmp` (unique per configuration) so each build gets its own isolated intermediate directory and no longer touches the source tree. ### Describe how you validated your changes Analysis of CI job logs: - #48188's failure: `PCbuild/obj/*.pdb` locked by another process during CleanAll, then `pyconfig.h` not found in the generated path - post-#48188 regression: `libdatadog-agent-three.dll` error code 5 in all rtloader tests ### Additional Notes `python.bat` (written by CPython's `build.bat` to `%sourcedir%`) is still a minor impurity in the source tree but is a single small file deleted on both sides; it is not the cause of the race.
### What does this PR do? - revert `common:windows --enable_runfiles=yes` added in #48188, which broke rtloader tests (`libdatadog-agent-three.dll`, error code 5 / `ERROR_ACCESS_DENIED`) - make `build_python.bat` hermetic: redirect all MSBuild outputs and intermediate files out of the shared execroot source tree into a per-configuration scratch directory derived from `$(@d)/tmp`, unique per Bazel configuration hash ### Motivation `build_python.bat` was writing MSBuild artefacts to `%sourcedir%\PCbuild\` (the shared execroot source tree): - `msbuild.rsp` (auto-loaded by MSBuild from the project dir) - `obj\` (intermediate objects, `Py_IntDir`) - `amd64\` (final binaries, `Py_OutDir`) - `python.bat` (written by `GeneratePythonBat`) With rules_python 1.9.0 introducing a configuration transition on `py_binary` / `py_test` for Windows, `python_win` was built in two Bazel configurations concurrently. Both invocations raced on those shared paths, corrupting `pyconfig.h` and causing intermittent failures (MSB4166, missing `pyconfig.h`, wrong DLL bitness) depending on remote-cache warmth. The `--enable_runfiles=yes` workaround deduplicates `python_win` by making the transition a no-op, but it causes Bazel to build NTFS junction-based runfiles trees. Those junctions make `LoadLibraryA` return `ERROR_ACCESS_DENIED` (code 5) for `libdatadog-agent-three.dll` in the CI security context, breaking all 10 rtloader tests. ### Describe how you validated your changes Observed races in CI jobs 1530195391, 1530694694, 1531128430. Traced the duplicate configuration to rules_python's `_transition_executable_impl` in `py_executable.bzl`. Confirmed `--enable_runfiles=yes` breaks rtloader in CI job 1531402700 (all 10 tests fail with `LoadLibraryA` error code 5). The hermetic fix isolates every build in `$(@d)/tmp`, so concurrent invocations never share artefacts regardless of the number of active Bazel configurations. ### Additional Notes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
### What does this PR do? - revert `common:windows --enable_runfiles=yes` added in #48188, which broke rtloader tests (`libdatadog-agent-three.dll`, error code 5 / `ERROR_ACCESS_DENIED`) - make `build_python.bat` hermetic: redirect all MSBuild outputs and intermediate files out of the shared execroot source tree into a per-configuration scratch directory derived from `$(@d)/tmp`, unique per Bazel configuration hash ### Motivation `build_python.bat` was writing MSBuild artefacts to `%sourcedir%\PCbuild\` (the shared execroot source tree): - `msbuild.rsp` (auto-loaded by MSBuild from the project dir) - `obj\` (intermediate objects, `Py_IntDir`) - `amd64\` (final binaries, `Py_OutDir`) - `python.bat` (written by `GeneratePythonBat`) With rules_python 1.9.0 introducing a configuration transition on `py_binary` / `py_test` for Windows, `python_win` was built in two Bazel configurations concurrently. Both invocations raced on those shared paths, corrupting `pyconfig.h` and causing intermittent failures (MSB4166, missing `pyconfig.h`, wrong DLL bitness) depending on remote-cache warmth. The `--enable_runfiles=yes` workaround deduplicates `python_win` by making the transition a no-op, but it causes Bazel to build NTFS junction-based runfiles trees. Those junctions make `LoadLibraryA` return `ERROR_ACCESS_DENIED` (code 5) for `libdatadog-agent-three.dll` in the CI security context, breaking all 10 rtloader tests. ### Describe how you validated your changes Observed races in CI jobs 1530195391, 1530694694, 1531128430. Traced the duplicate configuration to rules_python's `_transition_executable_impl` in `py_executable.bzl`. Confirmed `--enable_runfiles=yes` breaks rtloader in CI job 1531402700 (all 10 tests fail with `LoadLibraryA` error code 5). The hermetic fix isolates every build in `$(@d)/tmp`, so concurrent invocations never share artefacts regardless of the number of active Bazel configurations. ### Additional Notes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
### What does this PR do? - revert `common:windows --enable_runfiles=yes` added in #48188, which broke rtloader tests (`libdatadog-agent-three.dll`, error code 5 / `ERROR_ACCESS_DENIED`) - make `build_python.bat` hermetic: redirect all MSBuild outputs and intermediate files out of the shared execroot source tree into a per-configuration scratch directory derived from `$(@d)/tmp`, unique per Bazel configuration hash ### Motivation `build_python.bat` was writing MSBuild artefacts to `%sourcedir%\PCbuild\` (the shared execroot source tree): - `msbuild.rsp` (auto-loaded by MSBuild from the project dir) - `obj\` (intermediate objects, `Py_IntDir`) - `amd64\` (final binaries, `Py_OutDir`) - `python.bat` (written by `GeneratePythonBat`) With rules_python 1.9.0 introducing a configuration transition on `py_binary` / `py_test` for Windows, `python_win` was built in two Bazel configurations concurrently. Both invocations raced on those shared paths, corrupting `pyconfig.h` and causing intermittent failures (MSB4166, missing `pyconfig.h`, wrong DLL bitness) depending on remote-cache warmth. The `--enable_runfiles=yes` workaround deduplicates `python_win` by making the transition a no-op, but it causes Bazel to build NTFS junction-based runfiles trees. Those junctions make `LoadLibraryA` return `ERROR_ACCESS_DENIED` (code 5) for `libdatadog-agent-three.dll` in the CI security context, breaking all 10 rtloader tests. ### Describe how you validated your changes Observed races in CI jobs 1530195391, 1530694694, 1531128430. Traced the duplicate configuration to rules_python's `_transition_executable_impl` in `py_executable.bzl`. Confirmed `--enable_runfiles=yes` breaks rtloader in CI job 1531402700 (all 10 tests fail with `LoadLibraryA` error code 5). The hermetic fix isolates every build in `$(@d)/tmp`, so concurrent invocations never share artefacts regardless of the number of active Bazel configurations. ### Additional Notes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
### What does this PR do? - revert `common:windows --enable_runfiles=yes` added in #48188, which broke rtloader tests (`libdatadog-agent-three.dll`, error code 5 / `ERROR_ACCESS_DENIED`) - make `build_python.bat` hermetic: redirect all MSBuild outputs and intermediate files out of the shared execroot source tree into a per-configuration scratch directory derived from `$(@d)/tmp`, unique per Bazel configuration hash ### Motivation `build_python.bat` was writing MSBuild artefacts to `%sourcedir%\PCbuild\` (the shared execroot source tree): - `msbuild.rsp` (auto-loaded by MSBuild from the project dir) - `obj\` (intermediate objects, `Py_IntDir`) - `amd64\` (final binaries, `Py_OutDir`) - `python.bat` (written by `GeneratePythonBat`) With rules_python 1.9.0 introducing a configuration transition on `py_binary` / `py_test` for Windows, `python_win` was built in two Bazel configurations concurrently. Both invocations raced on those shared paths, corrupting `pyconfig.h` and causing intermittent failures (MSB4166, missing `pyconfig.h`, wrong DLL bitness) depending on remote-cache warmth. The `--enable_runfiles=yes` workaround deduplicates `python_win` by making the transition a no-op, but it causes Bazel to build NTFS junction-based runfiles trees. Those junctions make `LoadLibraryA` return `ERROR_ACCESS_DENIED` (code 5) for `libdatadog-agent-three.dll` in the CI security context, breaking all 10 rtloader tests. ### Describe how you validated your changes Observed races in CI jobs 1530195391, 1530694694, 1531128430. Traced the duplicate configuration to rules_python's `_transition_executable_impl` in `py_executable.bzl`. Confirmed `--enable_runfiles=yes` breaks rtloader in CI job 1531402700 (all 10 tests fail with `LoadLibraryA` error code 5). The hermetic fix isolates every build in `$(@d)/tmp`, so concurrent invocations never share artefacts regardless of the number of active Bazel configurations. ### Additional Notes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
### What does this PR do? - revert `common:windows --enable_runfiles=yes` added in #48188, which broke rtloader tests (`libdatadog-agent-three.dll`, error code 5 / `ERROR_ACCESS_DENIED`) - make `build_python.bat` hermetic: redirect all MSBuild outputs and intermediate files out of the shared execroot source tree into a per-configuration scratch directory derived from `$(@d)/tmp`, unique per Bazel configuration hash ### Motivation `build_python.bat` was writing MSBuild artefacts to `%sourcedir%\PCbuild\` (the shared execroot source tree): - `msbuild.rsp` (auto-loaded by MSBuild from the project dir) - `obj\` (intermediate objects, `Py_IntDir`) - `amd64\` (final binaries, `Py_OutDir`) - `python.bat` (written by `GeneratePythonBat`) With rules_python 1.9.0 introducing a configuration transition on `py_binary` / `py_test` for Windows, `python_win` was built in two Bazel configurations concurrently. Both invocations raced on those shared paths, corrupting `pyconfig.h` and causing intermittent failures (MSB4166, missing `pyconfig.h`, wrong DLL bitness) depending on remote-cache warmth. The `--enable_runfiles=yes` workaround deduplicates `python_win` by making the transition a no-op, but it causes Bazel to build NTFS junction-based runfiles trees. Those junctions make `LoadLibraryA` return `ERROR_ACCESS_DENIED` (code 5) for `libdatadog-agent-three.dll` in the CI security context, breaking all 10 rtloader tests. ### Describe how you validated your changes Observed races in CI jobs 1530195391, 1530694694, 1531128430. Traced the duplicate configuration to rules_python's `_transition_executable_impl` in `py_executable.bzl`. Confirmed `--enable_runfiles=yes` breaks rtloader in CI job 1531402700 (all 10 tests fail with `LoadLibraryA` error code 5). The hermetic fix isolates every build in `$(@d)/tmp`, so concurrent invocations never share artefacts regardless of the number of active Bazel configurations. ### Additional Notes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
### What does this PR do? - revert `common:windows --enable_runfiles=yes` added in #48188, which broke rtloader tests (`libdatadog-agent-three.dll`, error code 5 / `ERROR_ACCESS_DENIED`) - make `build_python.bat` hermetic: redirect all MSBuild outputs and intermediate files out of the shared execroot source tree into a per-configuration scratch directory derived from `$(@d)/tmp`, unique per Bazel configuration hash ### Motivation `build_python.bat` was writing MSBuild artefacts to `%sourcedir%\PCbuild\` (the shared execroot source tree): - `msbuild.rsp` (auto-loaded by MSBuild from the project dir) - `obj\` (intermediate objects, `Py_IntDir`) - `amd64\` (final binaries, `Py_OutDir`) - `python.bat` (written by `GeneratePythonBat`) With rules_python 1.9.0 introducing a configuration transition on `py_binary` / `py_test` for Windows, `python_win` was built in two Bazel configurations concurrently. Both invocations raced on those shared paths, corrupting `pyconfig.h` and causing intermittent failures (MSB4166, missing `pyconfig.h`, wrong DLL bitness) depending on remote-cache warmth. The `--enable_runfiles=yes` workaround deduplicates `python_win` by making the transition a no-op, but it causes Bazel to build NTFS junction-based runfiles trees. Those junctions make `LoadLibraryA` return `ERROR_ACCESS_DENIED` (code 5) for `libdatadog-agent-three.dll` in the CI security context, breaking all 10 rtloader tests. ### Describe how you validated your changes Observed races in CI jobs 1530195391, 1530694694, 1531128430. Traced the duplicate configuration to rules_python's `_transition_executable_impl` in `py_executable.bzl`. Confirmed `--enable_runfiles=yes` breaks rtloader in CI job 1531402700 (all 10 tests fail with `LoadLibraryA` error code 5). The hermetic fix isolates every build in `$(@d)/tmp`, so concurrent invocations never share artefacts regardless of the number of active Bazel configurations. ### Additional Notes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
### What does this PR do? - add `common:windows --enable_runfiles` to `.bazelrc` - widen `bazel:test:windows-amd64` from `//bazel/tests/... //rtloader/...` to `//...` with exclusions for Linux-, eBPF-, and gopatch-only targets ### Motivation rules_python 1.9.0 (#48082) transitions every `py_binary` and `py_test` on Windows from `enable_runfiles=auto` to `enable_runfiles=true`. With Bazel's default (`enable_runfiles=false` on Windows), this creates a second Bazel configuration, causing `python_win` to be built twice concurrently. `build_python.bat` writes MSBuild intermediates to the shared execroot source tree rather than the action's output directory, so both builds race on those files, causing intermittent `pyconfig.h` failures. This is a redo of #48188, which was preemptively reverted (#48207). Pre-setting `--enable_runfiles` makes the transition a no-op, so Bazel sees a single configuration and builds `python_win` once. Two prerequisites are now in place: #48281 provides `PYTHON_FOR_BUILD` to MSBuild, preventing `find_python.bat` from falling back to NuGet and other external sources under `--incompatible_strict_action_env`; #48087 and #48209 trigger Windows CI on `MODULE.bazel*` and `.bazel*` changes respectively, so the widened test surface below will catch regressions before they reach `main`. ### Describe how you validated your changes Local VM and, of course, CI. ### Additional Notes `//pkg/template/...` is excluded on Windows: gopatch v0.4.0 errors on `@@\r` in hunk markers when patch files have CRLF line endings, an unreported upstream bug with no workaround in gopatch itself.
Prior art: 1. #48082 2. #48087 3. #48188 (1st attempt) 4. #48207 5. #48281 6. #48209 ### What does this PR do? This is a redo of #48188 (preemptively reverted by #48207) and therefore merely consists in re-adding `common:windows --enable_runfiles` to `.bazelrc`. ... with lessons learned, thanks to earlier: - #48209 covers the present change to `.bazelrc`, - #48281 prevents `find_python.bat` from falling back to `NuGet` or other non hermetic sources. .... and now a widened `bazel:test:windows-amd64`, evolving from just `//bazel/tests/... //rtloader/...` to `//...` **- except currently failing targets** (for the time being, of course). ### Motivation Summary of #48188: - `rules_python` 1.9.0 (#48082) transitions every `py_binary` and `py_test` on Windows from `enable_runfiles=auto` to `enable_runfiles=true`, - with Bazel's default (`enable_runfiles=false` on Windows), this creates a second Bazel configuration, causing `python_win` to be built twice concurrently, - `build_python.bat` writes `MSBuild` intermediates to the shared execroot source tree rather than the action's output directory, so both builds race on those files, causing [intermittent failures](https://gitlab.ddbuild.io/DataDog/datadog-agent/-/jobs/1533753039) (`pyconfig.h` not found, etc.). **Pre-setting `--enable_runfiles` makes the transition a no-op, so Bazel sees a single configuration and builds `python_win` once.** ### Describe how you validated your changes Local VM and, of course, CI. ### Additional Notes For instance, `//pkg/template/...` is excluded on Windows because `gopatch` errors on `@@\r` in hunk markers when patch files have CRLF line endings, which deserves a distinct PR (likely adjusting `.gitattributes`). Co-authored-by: regis.desgroppes <regis.desgroppes@datadoghq.com>
…tc.) (#48188) ### What does this PR do? Add `common:windows --enable_runfiles` to `.bazelrc`. ### Motivation rules_python 1.9.0 (introduced in #48082) transitions every `py_binary` on Windows to `enable_runfiles=true`. With Bazel's default of `enable_runfiles=false` on Windows, this creates a second Bazel configuration, causing `python_win` to be built twice concurrently. `build_python.bat` writes MSBuild intermediate files (`PCbuild/obj/`, `PCbuild/amd64/`, `msbuild.rsp`) into the shared execroot source tree rather than into the action's output directory, so the two concurrent builds race on those files, manifesting as intermittent `pyconfig.h: No such file or directory` errors that disappear when the cache is warm. See, for instance: https://gitlab.ddbuild.io/DataDog/datadog-agent/-/jobs/1528499363 Setting `enable_runfiles=true` globally makes the transition a no-op (same flag value => same configuration hash => one build of `python_win`), eliminating the race. ### Describe how you validated your changes Analysis of a failing CI job log (`PCbuild/obj/*.pdb` locked by another process during CleanAll). ### Additional Notes This is a short-term workaround. The proper fix is to make `build_python.bat` hermetic by redirecting MSBuild's output and intermediate directories to `$(@d)` instead of the execroot source tree. `rules_python` doesn't touch `build_python.bat` directly. The connection is through Bazel's configuration graph: 1. `build_python.bat` is invoked by `python_win` (`run_binary` target) in `deps/cpython.BUILD.bazel`, 2. `python_win` is a dependency of `pkg_install` (via `install_files_win`), 3. `pkg_install` from `rules_pkg` is backed by a `py_binary`, 4. `rules_python` 1.9.0 makes every `py_binary` on Windows transition into a new Bazel configuration (`enable_runfiles=true`), 5. Since `python_win` sits in the dependency graph of that `py_binary`, Bazel now needs it in two configurations: the base one (for tests / other consumers) and the transitioned one (for `pkg_install`), 6. Two configurations => two independent Bazel actions, both calling `build_python.bat` in the same execroot directory => race. Co-authored-by: regis.desgroppes <regis.desgroppes@datadoghq.com>
Prior art: 1. #48082 2. #48087 3. #48188 (1st attempt) 4. #48207 5. #48281 6. #48209 ### What does this PR do? This is a redo of #48188 (preemptively reverted by #48207) and therefore merely consists in re-adding `common:windows --enable_runfiles` to `.bazelrc`. ... with lessons learned, thanks to earlier: - #48209 covers the present change to `.bazelrc`, - #48281 prevents `find_python.bat` from falling back to `NuGet` or other non hermetic sources. .... and now a widened `bazel:test:windows-amd64`, evolving from just `//bazel/tests/... //rtloader/...` to `//...` **- except currently failing targets** (for the time being, of course). ### Motivation Summary of #48188: - `rules_python` 1.9.0 (#48082) transitions every `py_binary` and `py_test` on Windows from `enable_runfiles=auto` to `enable_runfiles=true`, - with Bazel's default (`enable_runfiles=false` on Windows), this creates a second Bazel configuration, causing `python_win` to be built twice concurrently, - `build_python.bat` writes `MSBuild` intermediates to the shared execroot source tree rather than the action's output directory, so both builds race on those files, causing [intermittent failures](https://gitlab.ddbuild.io/DataDog/datadog-agent/-/jobs/1533753039) (`pyconfig.h` not found, etc.). **Pre-setting `--enable_runfiles` makes the transition a no-op, so Bazel sees a single configuration and builds `python_win` once.** ### Describe how you validated your changes Local VM and, of course, CI. ### Additional Notes For instance, `//pkg/template/...` is excluded on Windows because `gopatch` errors on `@@\r` in hunk markers when patch files have CRLF line endings, which deserves a distinct PR (likely adjusting `.gitattributes`). Co-authored-by: regis.desgroppes <regis.desgroppes@datadoghq.com>
…tc.) (#48188) ### What does this PR do? Add `common:windows --enable_runfiles` to `.bazelrc`. ### Motivation rules_python 1.9.0 (introduced in #48082) transitions every `py_binary` on Windows to `enable_runfiles=true`. With Bazel's default of `enable_runfiles=false` on Windows, this creates a second Bazel configuration, causing `python_win` to be built twice concurrently. `build_python.bat` writes MSBuild intermediate files (`PCbuild/obj/`, `PCbuild/amd64/`, `msbuild.rsp`) into the shared execroot source tree rather than into the action's output directory, so the two concurrent builds race on those files, manifesting as intermittent `pyconfig.h: No such file or directory` errors that disappear when the cache is warm. See, for instance: https://gitlab.ddbuild.io/DataDog/datadog-agent/-/jobs/1528499363 Setting `enable_runfiles=true` globally makes the transition a no-op (same flag value => same configuration hash => one build of `python_win`), eliminating the race. ### Describe how you validated your changes Analysis of a failing CI job log (`PCbuild/obj/*.pdb` locked by another process during CleanAll). ### Additional Notes This is a short-term workaround. The proper fix is to make `build_python.bat` hermetic by redirecting MSBuild's output and intermediate directories to `$(@d)` instead of the execroot source tree. `rules_python` doesn't touch `build_python.bat` directly. The connection is through Bazel's configuration graph: 1. `build_python.bat` is invoked by `python_win` (`run_binary` target) in `deps/cpython.BUILD.bazel`, 2. `python_win` is a dependency of `pkg_install` (via `install_files_win`), 3. `pkg_install` from `rules_pkg` is backed by a `py_binary`, 4. `rules_python` 1.9.0 makes every `py_binary` on Windows transition into a new Bazel configuration (`enable_runfiles=true`), 5. Since `python_win` sits in the dependency graph of that `py_binary`, Bazel now needs it in two configurations: the base one (for tests / other consumers) and the transitioned one (for `pkg_install`), 6. Two configurations => two independent Bazel actions, both calling `build_python.bat` in the same execroot directory => race. Co-authored-by: regis.desgroppes <regis.desgroppes@datadoghq.com>
Prior art: 1. #48082 2. #48087 3. #48188 (1st attempt) 4. #48207 5. #48281 6. #48209 ### What does this PR do? This is a redo of #48188 (preemptively reverted by #48207) and therefore merely consists in re-adding `common:windows --enable_runfiles` to `.bazelrc`. ... with lessons learned, thanks to earlier: - #48209 covers the present change to `.bazelrc`, - #48281 prevents `find_python.bat` from falling back to `NuGet` or other non hermetic sources. .... and now a widened `bazel:test:windows-amd64`, evolving from just `//bazel/tests/... //rtloader/...` to `//...` **- except currently failing targets** (for the time being, of course). ### Motivation Summary of #48188: - `rules_python` 1.9.0 (#48082) transitions every `py_binary` and `py_test` on Windows from `enable_runfiles=auto` to `enable_runfiles=true`, - with Bazel's default (`enable_runfiles=false` on Windows), this creates a second Bazel configuration, causing `python_win` to be built twice concurrently, - `build_python.bat` writes `MSBuild` intermediates to the shared execroot source tree rather than the action's output directory, so both builds race on those files, causing [intermittent failures](https://gitlab.ddbuild.io/DataDog/datadog-agent/-/jobs/1533753039) (`pyconfig.h` not found, etc.). **Pre-setting `--enable_runfiles` makes the transition a no-op, so Bazel sees a single configuration and builds `python_win` once.** ### Describe how you validated your changes Local VM and, of course, CI. ### Additional Notes For instance, `//pkg/template/...` is excluded on Windows because `gopatch` errors on `@@\r` in hunk markers when patch files have CRLF line endings, which deserves a distinct PR (likely adjusting `.gitattributes`). Co-authored-by: regis.desgroppes <regis.desgroppes@datadoghq.com>
What does this PR do?
Add
common:windows --enable_runfilesto.bazelrc.Motivation
rules_python 1.9.0 (introduced in #48082) transitions every
py_binaryon Windows toenable_runfiles=true.With Bazel's default of
enable_runfiles=falseon Windows, this creates a second Bazel configuration, causingpython_winto be built twice concurrently.build_python.batwrites MSBuild intermediate files (PCbuild/obj/,PCbuild/amd64/,msbuild.rsp) into the shared execroot source tree rather than into the action's output directory, so the two concurrent builds race on those files, manifesting as intermittentpyconfig.h: No such file or directoryerrors that disappear when the cache is warm.See, for instance: https://gitlab.ddbuild.io/DataDog/datadog-agent/-/jobs/1528499363
Setting
enable_runfiles=trueglobally makes the transition a no-op (same flag value => same configuration hash => one build ofpython_win), eliminating the race.Describe how you validated your changes
Analysis of a failing CI job log (
PCbuild/obj/*.pdblocked by another process during CleanAll).Additional Notes
This is a short-term workaround. The proper fix is to make
build_python.bathermetic by redirecting MSBuild's output and intermediate directories to$(@D)instead of the execroot source tree.rules_pythondoesn't touchbuild_python.batdirectly. The connection is through Bazel's configuration graph:build_python.batis invoked bypython_win(run_binarytarget) indeps/cpython.BUILD.bazel,python_winis a dependency ofpkg_install(viainstall_files_win),pkg_installfromrules_pkgis backed by apy_binary,rules_python1.9.0 makes everypy_binaryon Windows transition into a new Bazel configuration (enable_runfiles=true),python_winsits in the dependency graph of thatpy_binary, Bazel now needs it in two configurations: the base one (for tests / other consumers) and the transitioned one (forpkg_install),build_python.batin the same execroot directory => race.