nix flake check: free nixosConfigurations values after checking#15142
nix flake check: free nixosConfigurations values after checking#15142illustris wants to merge 1 commit intoNixOS:masterfrom
Conversation
src/nix/flake.cc
Outdated
| auto * mutableAttr = const_cast<Attr *>(&attr); | ||
| mutableAttr->value = state->allocValue(); | ||
| mutableAttr->value->mkNull(); |
There was a problem hiding this comment.
This doesn't seem sound to me at all. What if there's another thunk that ends up referring to it in another flake output (like a select expression)? There's a reason that attrs() returns a readonly view - it's not generally safe to modify an existing Bindings (or any non-thunk Value).
There was a problem hiding this comment.
You're right. I tested nixosConfigurations cross-referencing each other, but did not test other flake outputs referencing nixosConfigurations:
$ /run/current-system/sw/bin/time -v /nix/store/916a1jk389d54qmcsn1rjbakklhdq6k7-nix-2.34.0pre20260204_b61d150/bin/nix flake check /tmp/test-flake
warning: Git tree '/tmp/test-flake' is dirty
error:
… while checking flake output 'packages'
at /tmp/test-flake/flake.nix:30:3:
29| );
30| packages.x86_64-linux.default = self.nixosConfigurations."1".config.system.build.toplevel;
| ^
31| };
… while checking the derivation 'packages.x86_64-linux.default'
at /tmp/test-flake/flake.nix:30:3:
29| );
30| packages.x86_64-linux.default = self.nixosConfigurations."1".config.system.build.toplevel;
| ^
31| };
(stack trace truncated; use '--show-trace' to show the full, detailed trace)
error: expected a set but found null: null
|
Generally this issue seems like a tradeoff between sharing and doing redundant work. Until we've evaluated everything we don't know what will need to be forced so we can't do something similar to this ahead of time. Maybe the best we could do it somehow demote forced values back to thunks based on some heuristic (if we are reasonably sure that we don't incur an extra high cost by having to redo the work of forcing it), but we don't have such a mechanism yet. |
|
Also, are you sure that you are benchmarking without the eval cache and the fetcher cache prewarmed. This difference seems very suspicious to me:
|
For nixosConfigurations at least, this would make sense. A fully evaluated nixos system takes up way too much memory. The relatively small additional compute for re-evaluating config values is a good tradeoff for the memory savings. For example:
The cache was not pre-warmed, and the 100 node flake was causing a lot of swapping. But it doesn't make much of a difference for the memory util. I reran the tests with 20 nodes and 10 iterations. After warmup, the filesystem numbers were fairly consistent. baseline: patched: I'll update the patch to force evaluated nixosConfigurations back to thunks and test it. I think it will improve nix flake check for most usecases, with the exception of a small set of scenarios like |
0d24aa3 to
f282c79
Compare
test case flake code and full output of time -v: In my opinion, trading off the memory for extra compute in some edge cases makes sense here. Flakes with many NixOS configurations are common, but flakes that also expose other valid outputs forcing evaluation of every |
Save each nixosConfiguration's thunk state before checking, then restore it immediately after. This makes the evaluated configuration tree unreachable, allowing GC_gcollect() to reclaim memory before processing the next config. This keeps only one configuration's evaluation tree in memory at a time, rather than holding all evaluated configurations simultaneously.
Save each nixosConfiguration's thunk state before checking, then restore
it immediately after. This makes the evaluated configuration tree
unreachable, allowing GC_gcollect() to reclaim memory before processing
the next config. This keeps only one configuration's evaluation tree in
memory at a time, rather than holding all evaluated configurations
simultaneously.
Motivation
github:illustris/flake-check-mem-pochas 20 minimal nixos configurations for PVE VMs. Without this patch, nix flake check takes up about 5GB. Bumping that to 100 nodes makes the memory usage go to 18GB.with the patch:
Context
Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.