nix flake check: free nixosConfigurations values after checking by illustris · Pull Request #15142 · NixOS/nix

illustris · 2026-02-04T14:32:52Z

Save each nixosConfiguration's thunk state before checking, then restore
it immediately after. This makes the evaluated configuration tree
unreachable, allowing GC_gcollect() to reclaim memory before processing
the next config. This keeps only one configuration's evaluation tree in
memory at a time, rather than holding all evaluated configurations
simultaneously.

Motivation

github:illustris/flake-check-mem-poc has 20 minimal nixos configurations for PVE VMs. Without this patch, nix flake check takes up about 5GB. Bumping that to 100 nodes makes the memory usage go to 18GB.

        User time (seconds): 275.68
        System time (seconds): 174.27
        Percent of CPU this job got: 140%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 5:19.39
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 18579372
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 4482275
        Minor (reclaiming a frame) page faults: 5513432
        Voluntary context switches: 1069755
        Involuntary context switches: 65697
        Swaps: 0
        File system inputs: 35700044
        File system outputs: 201
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

with the patch:

        User time (seconds): 200.35
        System time (seconds): 3.96
        Percent of CPU this job got: 77%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 4:23.93
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 1479644
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 1807
        Minor (reclaiming a frame) page faults: 378344
        Voluntary context switches: 1070729
        Involuntary context switches: 31566
        Swaps: 0
        File system inputs: 435937
        File system outputs: 104
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Context

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

xokdvium · 2026-02-04T16:41:11Z

src/nix/flake.cc

+                            auto * mutableAttr = const_cast<Attr *>(&attr);
+                            mutableAttr->value = state->allocValue();
+                            mutableAttr->value->mkNull();


This doesn't seem sound to me at all. What if there's another thunk that ends up referring to it in another flake output (like a select expression)? There's a reason that attrs() returns a readonly view - it's not generally safe to modify an existing Bindings (or any non-thunk Value).

You're right. I tested nixosConfigurations cross-referencing each other, but did not test other flake outputs referencing nixosConfigurations:

$ /run/current-system/sw/bin/time -v /nix/store/916a1jk389d54qmcsn1rjbakklhdq6k7-nix-2.34.0pre20260204_b61d150/bin/nix flake check /tmp/test-flake warning: Git tree '/tmp/test-flake' is dirty error: … while checking flake output 'packages' at /tmp/test-flake/flake.nix:30:3: 29| ); 30| packages.x86_64-linux.default = self.nixosConfigurations."1".config.system.build.toplevel; | ^ 31| }; … while checking the derivation 'packages.x86_64-linux.default' at /tmp/test-flake/flake.nix:30:3: 29| ); 30| packages.x86_64-linux.default = self.nixosConfigurations."1".config.system.build.toplevel; | ^ 31| }; (stack trace truncated; use '--show-trace' to show the full, detailed trace) error: expected a set but found null: null

xokdvium · 2026-02-04T22:29:17Z

Generally this issue seems like a tradeoff between sharing and doing redundant work. Until we've evaluated everything we don't know what will need to be forced so we can't do something similar to this ahead of time. Maybe the best we could do it somehow demote forced values back to thunks based on some heuristic (if we are reasonably sure that we don't incur an extra high cost by having to redo the work of forcing it), but we don't have such a mechanism yet.

xokdvium · 2026-02-04T22:31:33Z

Also, are you sure that you are benchmarking without the eval cache and the fetcher cache prewarmed. This difference seems very suspicious to me:

File system inputs: 35700044

File system inputs: 435937

illustris · 2026-02-05T02:41:52Z

Generally this issue seems like a tradeoff between sharing and doing redundant work. Until we've evaluated everything we don't know what will need to be forced so we can't do something similar to this ahead of time. Maybe the best we could do it somehow demote forced values back to thunks based on some heuristic (if we are reasonably sure that we don't incur an extra high cost by having to redo the work of forcing it), but we don't have such a mechanism yet.

For nixosConfigurations at least, this would make sense. A fully evaluated nixos system takes up way too much memory. The relatively small additional compute for re-evaluating config values is a good tradeoff for the memory savings. For example:

$ /run/current-system/sw/bin/time -v /nix/store/916a1jk389d54qmcsn1rjbakklhdq6k7-nix-2.34.0pre20260204_b61d150/bin/nix eval /tmp/test-flake#nixosConfigurations.1.config.networking.hostId
warning: Git tree '/tmp/test-flake' is dirty
"11111111"
        Command being timed: "/nix/store/916a1jk389d54qmcsn1rjbakklhdq6k7-nix-2.34.0pre20260204_b61d150/bin/nix eval /tmp/test-flake#nixosConfigurations.1.config.networking.hostId"
        User time (seconds): 0.68
        System time (seconds): 0.14
        Percent of CPU this job got: 70%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.16
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 178936
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 16
        Minor (reclaiming a frame) page faults: 42786
        Voluntary context switches: 2971
        Involuntary context switches: 40
        Swaps: 0
        File system inputs: 31842
        File system outputs: 192
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Also, are you sure that you are benchmarking without the eval cache and the fetcher cache prewarmed. This difference seems very suspicious to me:

File system inputs: 35700044

File system inputs: 435937

The cache was not pre-warmed, and the 100 node flake was causing a lot of swapping. But it doesn't make much of a difference for the memory util. I reran the tests with 20 nodes and 10 iterations. After warmup, the filesystem numbers were fairly consistent.

baseline:

        Command being timed: "/nix/store/614dzfxcahl6q6dhz9ysjfsrb948sqkh-nix-2.34.0pre20260203_27435e0/bin/nix flake check /tmp/test-flake"
        User time (seconds): 56.57
        System time (seconds): 2.39
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:59.01
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 4959664
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 1246096
        Voluntary context switches: 215181
        Involuntary context switches: 3295
        Swaps: 0
        File system inputs: 64634
        File system outputs: 98
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

patched:

        Command being timed: "/nix/store/916a1jk389d54qmcsn1rjbakklhdq6k7-nix-2.34.0pre20260204_b61d150/bin/nix flake check /tmp/test-flake"
        User time (seconds): 45.52
        System time (seconds): 1.22
        Percent of CPU this job got: 80%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:57.99
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 1286680
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 327877
        Voluntary context switches: 217838
        Involuntary context switches: 4249
        Swaps: 0
        File system inputs: 64345
        File system outputs: 42
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

I'll update the patch to force evaluated nixosConfigurations back to thunks and test it. I think it will improve nix flake check for most usecases, with the exception of a small set of scenarios like
packages.x86_64-linux.default = self.nixosConfigurations."1".config.system.build.toplevel;, where it will have to re-evaluate the whole system configuration.

illustris · 2026-02-05T06:47:24Z

Test Case	Variant	Wall Time	User Time	Max RSS (MB)
NixOS configs only	baseline	0:59.70	58.03s	~4,847
NixOS configs only	patch	0:58.96	56.37s	~701
Configs + pkg accessing config attr	baseline	1:00.31	57.85s	~4,847
Configs + pkg accessing config attr	patch	1:03.55	64.54s	~1,605
Configs + pkg referencing toplevel	baseline	1:38.52	87.99s	~8,484
Configs + pkg referencing toplevel	patch	2:37.73	147.28s	~8,809

Case 1: Patch reduces peak RSS by ~85% (4.8G → 701M) with no time increase.
Case 2: Patch reduces peak RSS by ~67% (4.8G → 1.6G) with minor (~5%) time increase.
Case 3: peak RSS is the same and time goes up by ~60% (1:38 → 2:37) when toplevel is forced.

test case flake code and full output of time -v:
https://gist.github.com/illustris/391bd5562499aea1df12133c1d04ff23

In my opinion, trading off the memory for extra compute in some edge cases makes sense here. Flakes with many NixOS configurations are common, but flakes that also expose other valid outputs forcing evaluation of every config.system.build.toplevel are rare.

Save each nixosConfiguration's thunk state before checking, then restore it immediately after. This makes the evaluated configuration tree unreachable, allowing GC_gcollect() to reclaim memory before processing the next config. This keeps only one configuration's evaluation tree in memory at a time, rather than holding all evaluated configurations simultaneously.

github-actions bot added the new-cli Relating to the "nix" command label Feb 4, 2026

xokdvium requested changes Feb 4, 2026

View reviewed changes

illustris force-pushed the flake-mem branch 2 times, most recently from 0d24aa3 to f282c79 Compare February 5, 2026 06:26

illustris force-pushed the flake-mem branch from f282c79 to e9ee149 Compare February 5, 2026 07:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

nix flake check: free nixosConfigurations values after checking#15142

nix flake check: free nixosConfigurations values after checking#15142
illustris wants to merge 1 commit intoNixOS:masterfrom
illustris:flake-mem

illustris commented Feb 4, 2026 •

edited

Loading

Uh oh!

xokdvium Feb 4, 2026

Uh oh!

illustris Feb 5, 2026

Uh oh!

xokdvium commented Feb 4, 2026

Uh oh!

xokdvium commented Feb 4, 2026

Uh oh!

illustris commented Feb 5, 2026

Uh oh!

illustris commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

illustris commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Context

Uh oh!

xokdvium Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

illustris Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

xokdvium commented Feb 4, 2026

Uh oh!

xokdvium commented Feb 4, 2026

Uh oh!

illustris commented Feb 5, 2026

Uh oh!

illustris commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

illustris commented Feb 4, 2026 •

edited

Loading