Skip to content

Memory usage improvements#1439

Merged
qkaiser merged 3 commits intomainfrom
memory-improvements
Mar 20, 2026
Merged

Memory usage improvements#1439
qkaiser merged 3 commits intomainfrom
memory-improvements

Conversation

@qkaiser
Copy link
Contributor

@qkaiser qkaiser commented Mar 18, 2026

This branch contains multiple optimizations around unblob's memory consumption, specifically RSS usage when used on Linux.

unblob makes heavy use of mmap'ed files through the File class but always defaults to the MADV_NORMAL policy. This policy applies moderate read-ahead and is conservative about reclaiming pages, which can cause RSS to grow to the full size of large input files.

While this is generally okay on analysts machines, this can become problematic on constrained environment where parallel extractions are competing for memory to the risk of being killed by OOM.

We address this in two ways:

  • use MADV_SEQUENTIAL when scanning file with HyperScan. Since we expect pages to be read in sequential order, pages in the given range can be aggressively read ahead, and may be freed soon after they are accessed.
  • explicitly set MADV_DONTNEED on pages that were just read through File.read(), it's a generic approach so that every unblob component reading from a File can take advantage of it.

On top of that, we noticed a syscall optimization around File.size(), avoiding millions of fstat syscalls per run on complex firmware.

On a 7GB gzip multi-chunk stream containing a 14GB extfs filesystem, the initial consumption observed by memray follows this trend:

image

With the modifications in this branch, we end up with this trend instead:

image

Since memray was used to perform memory profiling, I added it as a profiling dev dependency.

I would also like to point out that these optimizations were already spotted years ago. At the time we chose to move to streaming mode but never implemented the madvise calls. See #477 (comment)

@qkaiser qkaiser self-assigned this Mar 18, 2026
@qkaiser qkaiser added enhancement New feature or request help wanted Extra attention is needed python Pull requests that update Python code labels Mar 18, 2026
@qkaiser qkaiser force-pushed the memory-improvements branch from 581885e to e53ae99 Compare March 18, 2026 14:20
@vlaci
Copy link
Contributor

vlaci commented Mar 19, 2026

I don't question the decrease in memory usage by this MR, I just don't really understand how each part affects it in isolation. Also it remains a question for me how aggressive we need to be reclaiming memory. Also how does the memory usage profile change between before and after this change when under memory pressure?

@qkaiser
Copy link
Contributor Author

qkaiser commented Mar 19, 2026

I don't question the decrease in memory usage by this MR

noted

I just don't really understand how each part affects it in isolation.

I'll provide traces before/after each commit so we can make informed decisions.

Also it remains a question for me how aggressive we need to be reclaiming memory.

I'd argue we can be pretty aggressive in reclaiming memory as long as the impact on speed is limited.

Also how does the memory usage profile change between before and after this change when under memory pressure?

Do you have ideas on how we can test that ? Any tools that could simulate memory pressure to see how unblob behaves ?

@vlaci
Copy link
Contributor

vlaci commented Mar 19, 2026

Do you have ideas on how we can test that ? Any tools that could simulate memory pressure to see how unblob behaves ?

When we first experimented with mmap, I don't recall if I tested it only in a memory constrained docker container or a VM, when we did the mmap transition. I was looking at two things: scanning through a file bigger than the available memory works, and that memory gets reclaimed from unblob when a process external to unblob puts additional pressure on the kernel

@martonilles
Copy link
Contributor

When a memory area is madvised with DONTNEED, the RSS usage drops:

Note that, when applied to shared mappings, MADV_DONTNEED might not lead to immediate freeing of the pages in the range.  The kernel is free to delay freeing the pages until an appropriate moment.  The resident set size (RSS) of the calling process will be immediately reduced however.

However in case of a mmap-ed memory backed by a file, under memory pressure the kernel can "reclaim" the memory, so this is not causing OOM errors. However in this particular case the process is running in a docker container with a memory limit. In case a docker container (cgroup) memory limit is reached the kernel tries to reclaim and only goes into OOM if after the reclaim it still fails.

So, in normal cases, DONTNEED is just a hint (advise) and should have no actual impact. Visible the RSS drops, but from a kernel standpoint it does not matter. On the other hand the docker memory limit calculation the RSS might matter, however the reclaim should handle it.

@martonilles
Copy link
Contributor

On the other hand calling too many madvise could be slow.

I would look at what exactly happens in a docker container+mem limit scenario with mmap-ed files.

@martonilles
Copy link
Contributor

martonilles commented Mar 19, 2026

@qkaiser
Copy link
Contributor Author

qkaiser commented Mar 19, 2026

On the other hand calling too many madvise could be slow.

yes, time impact will be measured. When I talked with @vlaci I explained that the regained syscalls by removing fstat calls could be spent calling madvise instead 🙃

I would look at what exactly happens in a docker container+mem limit scenario with mmap-ed files.

that's the plan, objective is to see if we can actually reproduce OOM

@qkaiser qkaiser marked this pull request as draft March 19, 2026 16:33
@qkaiser
Copy link
Contributor Author

qkaiser commented Mar 20, 2026

I have been misled by memray that does not distinguish between RssFile and VmRss (see bloomberg/memray#778).

I'll keep:

  • the MADVISE_SEQUENTIAL when we initialize the file, so we hint the kernel
  • the syscall improvement around size() and fstat calls

@qkaiser qkaiser force-pushed the memory-improvements branch from e53ae99 to fc87f1e Compare March 20, 2026 08:19
@qkaiser qkaiser marked this pull request as ready for review March 20, 2026 08:19
@qkaiser qkaiser force-pushed the memory-improvements branch from fc87f1e to ec785ab Compare March 20, 2026 08:20
@qkaiser qkaiser added this pull request to the merge queue Mar 20, 2026
@qkaiser qkaiser removed this pull request from the merge queue due to a manual request Mar 20, 2026
qkaiser added 3 commits March 20, 2026 19:04
I've been using memray to perform memory profiling, let's keep it in our
dependencies as a custom profiling dependency.
By default, an mmap file uses the MADV_NORMAL policy, which applies
moderate read-ahead but is conservative about reclaiming pages. This may
cause issues constrainted environments.

With MADV_SEQUENTIAL⁰, the kernel:
- reads ahead more aggressively, reducing page-fault stalls
- reclaims already-scanned pages sooner under memory pressure

[0] https://man7.org/linux/man-pages/man2/madvise.2.html
We never append to the backing file, so returning len(self) (the size of
the mapping) is always correct. If we were appending to the file, we
couldn't access the out-of-mapping range anyway.

The main hot path is stream_scan_chunks, which calls file.size() on
every loop iteration — once per DEFAULT_BUFSIZE (64 KiB) slice. For a
7 GiB file that is ~114 000 avoided fstat() syscalls per scan.
@qkaiser qkaiser force-pushed the memory-improvements branch from ec785ab to 49158e1 Compare March 20, 2026 18:04
@qkaiser qkaiser enabled auto-merge March 20, 2026 18:04
@qkaiser qkaiser added this pull request to the merge queue Mar 20, 2026
Merged via the queue into main with commit 4c2b661 Mar 20, 2026
23 checks passed
@qkaiser qkaiser deleted the memory-improvements branch March 20, 2026 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request help wanted Extra attention is needed python Pull requests that update Python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants