fix(fs): make recursive listStream truly O(1) via pull-based iteration (#198)#621
Closed
fix(fs): make recursive listStream truly O(1) via pull-based iteration (#198)#621
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #198.
listStream()(ARO-0051) was advertised as O(1) peak memory but usedAsyncThrowingStream { continuation in Task { ... continuation.yield(...) } }, whose default buffering policy is.unbounded. On a large recursive scan the producer Task ran far ahead of the for-each consumer, accumulating hundreds of thousands of[String: any Sendable]dicts in the async buffer. On/Users/kristhe reporter saw RSS peak at ~1.7 GB before settling back to the ~192 MB runtime baseline.Root cause
The producer-Task +
continuation.yieldpattern has no backpressure:AsyncThrowingStreamwith.unboundedbuffering never blocks the yielder. The for-each consumer cannot drain the dicts as fast asFileManager.DirectoryEnumerator+url.resourceValues(...)can emit them, so every entry remains live in the buffer until consumption.Fix
Switch
listStream(in both the Darwin/Linux and WindowsAROFileSystemServicecopies) to the pull-basedAsyncThrowingStream(unfolding:)initializer. The closure is invoked exactly once per consumernext()call, so exactly one entry is live at a time.LazyDirectoryListwrapper that already powers the compiled-mode path. It is@unchecked Sendable, wraps each enumerator step in anautoreleasepoolon Darwin (eliminating NSObject accumulation), and produces the same entry dict shape asFileInfo.toDictionary().[URL]cursor on demand via a tinyURLCursorhelper.No producer Task, no unbounded buffer, no behavioural change.
Files
Sources/ARORuntime/FileSystem/FileSystemService.swift—URLCursorhelper + rewritten recursive/non-recursive branches of bothlistStreamimplementationsTest plan
swift build— cleanswift test --filter \"List|Stream|FileSystem\"— 117 passing (includesAROStream Tests,StreamTee Tests,Glob Pattern Matching Tests)Examples/DirectoryReplicator— canonicalList ... recursivelyexample runs end-to-end with the expected output (entry dict shape and iteration order unchanged)/usr/bin/time -l aro run …: 77 MB maximum resident set size, 30 MB peak memory footprint, all 100 000 entries successfully streamed. The ARO runtime baseline is ~30 MB, so the streaming overhead is effectively 0. For comparison the issue reports 1.7 GB for a 200k–500k tree on the old implementation.Follow-ups (out of scope)
The issue also notes:
file-stats-repositorygrowth between prune cycles — unrelated tolistStreamitself, would need repository-level changes.publishAndTrackexecution contexts during Observer fan-out — structural, not addressed here.The per-scan memory fix in this MR already bounds peak RSS to O(runtime baseline + threshold) regardless of tree size, which is the dominant factor called out in the issue.