Skip to content

fix(fs): make recursive listStream truly O(1) via pull-based iteration (#198)#621

Closed
KrisSimon wants to merge 3 commits intomainfrom
migrate-stub-pr
Closed

fix(fs): make recursive listStream truly O(1) via pull-based iteration (#198)#621
KrisSimon wants to merge 3 commits intomainfrom
migrate-stub-pr

Conversation

@KrisSimon
Copy link
Copy Markdown
Member

Migrated from GitLab MR !228 (merged)
feat/198-streaming-list-memorymain
Originally created: 2026-04-06
Merged: 2026-04-08
Author: Kris Simon
Labels: 0.8.2

Summary

Closes #198. listStream() (ARO-0051) was advertised as O(1) peak memory but used AsyncThrowingStream { continuation in Task { ... continuation.yield(...) } }, whose default buffering policy is .unbounded. On a large recursive scan the producer Task ran far ahead of the for-each consumer, accumulating hundreds of thousands of [String: any Sendable] dicts in the async buffer. On /Users/kris the reporter saw RSS peak at ~1.7 GB before settling back to the ~192 MB runtime baseline.

Root cause

The producer-Task + continuation.yield pattern has no backpressure: AsyncThrowingStream with .unbounded buffering never blocks the yielder. The for-each consumer cannot drain the dicts as fast as FileManager.DirectoryEnumerator + url.resourceValues(...) can emit them, so every entry remains live in the buffer until consumption.

Fix

Switch listStream (in both the Darwin/Linux and Windows AROFileSystemService copies) to the pull-based AsyncThrowingStream(unfolding:) initializer. The closure is invoked exactly once per consumer next() call, so exactly one entry is live at a time.

  • Recursive branch: reuse the existing LazyDirectoryList wrapper that already powers the compiled-mode path. It is @unchecked Sendable, wraps each enumerator step in an autoreleasepool on Darwin (eliminating NSObject accumulation), and produces the same entry dict shape as FileInfo.toDictionary().
  • Non-recursive branch: iterate a pre-sorted [URL] cursor on demand via a tiny URLCursor helper.

No producer Task, no unbounded buffer, no behavioural change.

Files

  • Sources/ARORuntime/FileSystem/FileSystemService.swiftURLCursor helper + rewritten recursive/non-recursive branches of both listStream implementations

Test plan

  • swift build — clean
  • swift test --filter \"List|Stream|FileSystem\" — 117 passing (includes AROStream Tests, StreamTee Tests, Glob Pattern Matching Tests)
  • Examples/DirectoryReplicator — canonical List ... recursively example runs end-to-end with the expected output (entry dict shape and iteration order unchanged)
  • Memory repro (the actual acceptance check). Synthetic 100 000-file tree (100 dirs × 1000 files):
    Create the <root> with \"/tmp/aro-issue198/big\".
    List the <all-entries: recursively> from the <directory: root>.
    for each <entry> in <all-entries> where <entry: isFile> is true {
        Log <entry: size> to the <console>.
    }
    
    /usr/bin/time -l aro run …: 77 MB maximum resident set size, 30 MB peak memory footprint, all 100 000 entries successfully streamed. The ARO runtime baseline is ~30 MB, so the streaming overhead is effectively 0. For comparison the issue reports 1.7 GB for a 200k–500k tree on the old implementation.

Follow-ups (out of scope)

The issue also notes:

  • file-stats-repository growth between prune cycles — unrelated to listStream itself, would need repository-level changes.
  • Stacked publishAndTrack execution contexts during Observer fan-out — structural, not addressed here.

The per-scan memory fix in this MR already bounds peak RSS to O(runtime baseline + threshold) regardless of tree size, which is the dominant factor called out in the issue.

@KrisSimon KrisSimon closed this Apr 10, 2026
@KrisSimon KrisSimon deleted the migrate-stub-pr branch April 10, 2026 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Publish IntelliJ ARO plugin to JetBrains Marketplace

1 participant