Conversation
581885e to
e53ae99
Compare
|
I don't question the decrease in memory usage by this MR, I just don't really understand how each part affects it in isolation. Also it remains a question for me how aggressive we need to be reclaiming memory. Also how does the memory usage profile change between before and after this change when under memory pressure? |
noted
I'll provide traces before/after each commit so we can make informed decisions.
I'd argue we can be pretty aggressive in reclaiming memory as long as the impact on speed is limited.
Do you have ideas on how we can test that ? Any tools that could simulate memory pressure to see how unblob behaves ? |
When we first experimented with mmap, I don't recall if I tested it only in a memory constrained docker container or a VM, when we did the mmap transition. I was looking at two things: scanning through a file bigger than the available memory works, and that memory gets reclaimed from unblob when a process external to unblob puts additional pressure on the kernel |
|
When a memory area is madvised with DONTNEED, the RSS usage drops: However in case of a mmap-ed memory backed by a file, under memory pressure the kernel can "reclaim" the memory, so this is not causing OOM errors. However in this particular case the process is running in a docker container with a memory limit. In case a docker container (cgroup) memory limit is reached the kernel tries to reclaim and only goes into OOM if after the reclaim it still fails. So, in normal cases, DONTNEED is just a hint (advise) and should have no actual impact. Visible the RSS drops, but from a kernel standpoint it does not matter. On the other hand the docker memory limit calculation the RSS might matter, however the reclaim should handle it. |
|
On the other hand calling too many madvise could be slow. I would look at what exactly happens in a docker container+mem limit scenario with mmap-ed files. |
Would be nice to double check if indeed the files are mapped readonly and shared |
yes, time impact will be measured. When I talked with @vlaci I explained that the regained syscalls by removing
that's the plan, objective is to see if we can actually reproduce OOM |
|
I have been misled by memray that does not distinguish between I'll keep:
|
e53ae99 to
fc87f1e
Compare
fc87f1e to
ec785ab
Compare
I've been using memray to perform memory profiling, let's keep it in our dependencies as a custom profiling dependency.
By default, an mmap file uses the MADV_NORMAL policy, which applies moderate read-ahead but is conservative about reclaiming pages. This may cause issues constrainted environments. With MADV_SEQUENTIAL⁰, the kernel: - reads ahead more aggressively, reducing page-fault stalls - reclaims already-scanned pages sooner under memory pressure [0] https://man7.org/linux/man-pages/man2/madvise.2.html
We never append to the backing file, so returning len(self) (the size of the mapping) is always correct. If we were appending to the file, we couldn't access the out-of-mapping range anyway. The main hot path is stream_scan_chunks, which calls file.size() on every loop iteration — once per DEFAULT_BUFSIZE (64 KiB) slice. For a 7 GiB file that is ~114 000 avoided fstat() syscalls per scan.
ec785ab to
49158e1
Compare
This branch contains multiple optimizations around unblob's memory consumption, specifically RSS usage when used on Linux.
unblob makes heavy use of mmap'ed files through the
Fileclass but always defaults to theMADV_NORMALpolicy. This policy applies moderate read-ahead and is conservative about reclaiming pages, which can cause RSS to grow to the full size of large input files.While this is generally okay on analysts machines, this can become problematic on constrained environment where parallel extractions are competing for memory to the risk of being killed by OOM.
We address this in two ways:
MADV_SEQUENTIALwhen scanning file with HyperScan. Since we expect pages to be read in sequential order, pages in the given range can be aggressively read ahead, and may be freed soon after they are accessed.MADV_DONTNEEDon pages that were just read throughFile.read(), it's a generic approach so that every unblob component reading from a File can take advantage of it.On top of that, we noticed a syscall optimization around
File.size(), avoiding millions offstatsyscalls per run on complex firmware.On a 7GB gzip multi-chunk stream containing a 14GB extfs filesystem, the initial consumption observed by memray follows this trend:
With the modifications in this branch, we end up with this trend instead:
Since memray was used to perform memory profiling, I added it as a
profilingdev dependency.I would also like to point out that these optimizations were already spotted years ago. At the time we chose to move to streaming mode but never implemented the
madvisecalls. See #477 (comment)