-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Fix concurrent builder failure empty message bugs #14972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add helpers to the base Goal class that set buildResult and call amDone, ensuring buildResult is always populated when a goal terminates. Derived class helpers now call the base class versions. This reorders operations: previously buildResult was set before bookkeeping (counter resets, worker stats), now it's set after. This is safe because the bookkeeping code (mcExpectedBuilds.reset(), worker.doneBuilds++, worker.updateProgress(), etc.) only accesses worker counters, not buildResult.
Eveeifyeve
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw no issue in this, code wise however I am wondering how the tests are looking, to see if it can reproduce this issue so it's fixed.
0c9b296 to
febab73
Compare
|
Ok, I've added more intentional regression tests. The regression tests now cover both fixes. |
febab73 to
7b53a13
Compare
|
@Ericson2314, what's left to do here? |
|
@roberth was going to squash to 2 commits, and then I was going to take a stab at removing an the optional exception argument since the |
DerivationTrampolineGoal is the top-level goal whose buildResult is returned by buildPathsWithResults. When it failed without setting buildResult.inner, buildPathsWithResults would return failures with empty errorMsg, producing error messages like: error: failed to build attribute 'checks.x86_64-linux.foo', build of '/nix/store/...drv^*' failed: (note the empty message after "failed:") Use the new doneFailure helper to ensure buildResult is populated with meaningful error information.
When keepGoing=false and a build fails, other goals are cancelled. Previously, these cancelled goals were reported in the "build of ... failed" error message alongside actual failures. This was misleading since cancelled goals didn't actually fail - they were never tried. Update the test to expect only the actual failure (hash mismatch) to be reported, not the cancelled goals.
When !keepGoing and a goal fails, other goals are cancelled and
remain with exitCode == ecBusy. These cancelled goals have a default
BuildResult::Failure{} with empty errorMsg.
Previously, buildPathsWithResults would return these cancelled goals,
and throwBuildErrors would report them as failures. When only one such
cancelled goal was present, it would throw an error with an empty
message like:
error: build of '/nix/store/...drv^*' failed:
Now we skip goals with ecBusy since their state is indeterminate.
Cancelled goals could be reported, but this keeps the output relevant.
Other indeterminate goal states were already not being reported, for
instance: derivations that weren't started for being blocked on a
concurrency limit, or blocked on a currently building dependency.
Change "cannot build missing derivation" to "failed to obtain derivation of" since the path (e.g. '...drv^out') is a derivation output, not a derivation. The message could be improved further to resolve ambiguity when multiple outputOf links are involved, but for now we err on the side of brevity since this message is already merged into larger error messages with other context from the Worker and CLI.
7b53a13 to
3c3ceb1
Compare
The fixes are each in their own commit now, but I have still kept separate:
Should we block the fix on that? |
Don't think so. It's a pretty annoying bug. |
|
OK I'll just do the thing I want go do after. |
|
Successfully created backport PR for |
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin 2.32-maintenance
git worktree add -d .worktree/backport-14972-to-2.32-maintenance origin/2.32-maintenance
cd .worktree/backport-14972-to-2.32-maintenance
git switch --create backport-14972-to-2.32-maintenance
git cherry-pick -x cb2ade20d4b45ae1f1838d453c5484287516308f 25eb07a91b377f62322600d45d520453111a79eb 3fd85c7d64327df3b145f239211ebf7519a69c2a 68f549def46586fc5aee8b5ce62e7e8bb5a16703 3c3ceb18e9a3c421ab5990e381dfbe40e3dc3cec |
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin 2.33-maintenance
git worktree add -d .worktree/backport-14972-to-2.33-maintenance origin/2.33-maintenance
cd .worktree/backport-14972-to-2.33-maintenance
git switch --create backport-14972-to-2.33-maintenance
git cherry-pick -x cb2ade20d4b45ae1f1838d453c5484287516308f 25eb07a91b377f62322600d45d520453111a79eb 3fd85c7d64327df3b145f239211ebf7519a69c2a 68f549def46586fc5aee8b5ce62e7e8bb5a16703 3c3ceb18e9a3c421ab5990e381dfbe40e3dc3cec |
|
Slated for release in 2.33.2 and 2.34.0. |
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin 2.33-maintenance
git worktree add -d .worktree/backport-14972-to-2.33-maintenance origin/2.33-maintenance
cd .worktree/backport-14972-to-2.33-maintenance
git switch --create backport-14972-to-2.33-maintenance
git cherry-pick -x cb2ade20d4b45ae1f1838d453c5484287516308f 25eb07a91b377f62322600d45d520453111a79eb 3fd85c7d64327df3b145f239211ebf7519a69c2a 68f549def46586fc5aee8b5ce62e7e8bb5a16703 3c3ceb18e9a3c421ab5990e381dfbe40e3dc3cec |
|
This fails for functional_root: |
|
FWIW, I had to revert 68f549d in Determinate Nix because we rely on |
|
The proper fix would have to be in #14559 so that we don't misreport those as failures. The only question would be the back-compat with older clients not understanding the cancelled status code. That might be a use-case for fine grained features? |
|
@amaanq and I have some code for daemon protocol compat that will help with that, FYI. |
Motivation
When
!keepGoing(the default), and a derivation fails, other derivations are cancelled.Due to two mistakes, cancelled derivations would be reported as failed, with an empty error message.
amDonewas called without setting theBuildResultin trampoline goalSee commit messages for more detailed explanations and substantiation.
Context
buildPathsWithResultsuse cases, which is not the whole CLI, and notably notnix build.goal->exworked fine.nix flake check#14321 makesnix flake checkusebuildPathsWithResultsAdd 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.