Skip to content

fix: prevent OOM kill by skipping excluded dirs before traversal#196

Open
jimmy-infinex wants to merge 1 commit intotobi:mainfrom
jimmy-infinex:fix/glob-oom-exclude-dirs
Open

fix: prevent OOM kill by skipping excluded dirs before traversal#196
jimmy-infinex wants to merge 1 commit intotobi:mainfrom
jimmy-infinex:fix/glob-oom-exclude-dirs

Conversation

@jimmy-infinex
Copy link

Problem

Glob.scan() from Bun traverses into node_modules, .git, and other excluded directories before the post-hoc filter can reject matching files. On large monorepos (e.g. 8+ JavaScript sub-repos with massive node_modules trees), this causes the indexing process to buffer millions of directory entries in memory and get SIGKILL'd by the OS.

Reproducing is straightforward: run qmd collection add on any directory containing a large JS monorepo with **/*.md as the glob pattern.

Fix

Replace Glob.scan() with a manual recursive directory walker (readdirSync + withFileTypes) that checks excludeDirs and hidden directories (.*) before descending into them. Files are still matched against the glob pattern via Glob.match().

This is a drop-in behavioral replacement:

  • Same files indexed
  • Same exclusions applied
  • Symlinks still followed (with broken symlink handling)
  • Memory usage stays constant regardless of excluded directory size

Testing

  • All 167 existing tests pass
  • Manually verified: node_modules/ directory with files is correctly skipped (not traversed), while sibling directories are indexed normally

Bun's Glob.scan() traverses into node_modules and other excluded
directories before the post-hoc filter can reject them. On large
monorepos (e.g. 8+ sub-repos with massive node_modules), this causes
the process to consume gigabytes of memory and get SIGKILL'd by the OS.

Replace Glob.scan() with a manual recursive directory walker that
checks excludeDirs and hidden directories BEFORE descending. Files
are still matched against the glob pattern via Glob.match().

This is a drop-in behavioral replacement — same files indexed, same
exclusions applied, but memory usage stays constant regardless of
how large the excluded directories are.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an OOM (Out of Memory) issue that occurs when indexing large monorepos. The problem was that Glob.scan() from Bun traverses into excluded directories like node_modules before filtering can reject them, causing memory exhaustion on repositories with millions of files.

Changes:

  • Replaced Glob.scan() with manual recursive directory traversal using readdirSync that checks exclusion criteria before descending into directories
  • Added readdirSync import from fs module
  • Implemented walkDir() helper function with proper symlink handling and early directory exclusion

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


progress.indeterminate();
const glob = new Glob(globPattern);
const excludeSet = new Set([...excludeDirs, "node_modules"]);
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The excludeSet includes "node_modules" twice: once from the spread of excludeDirs array (which already contains "node_modules" at line 1354) and once as a direct addition. This redundancy should be removed by either removing "node_modules" from the explicit addition here, or removing it from the excludeDirs array definition.

Suggested change
const excludeSet = new Set([...excludeDirs, "node_modules"]);
const excludeSet = new Set(excludeDirs);

Copilot uses AI. Check for mistakes.
Comment on lines +1387 to +1400
if (entry.isSymbolicLink()) {
try {
const target = statSync(resolve(resolvedPwd, dir, name));
if (!target.isDirectory()) {
// Symlink to a file — check if it matches
const relPath = relPrefix ? `${relPrefix}/${name}` : name;
if (glob.match(relPath)) files.push(relPath);
continue;
}
} catch {
continue; // Broken symlink
}
}
walkDir(dir ? `${dir}/${name}` : name, relPrefix ? `${relPrefix}/${name}` : name);
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The symlink handling could lead to infinite recursion if there are circular symlinks in the directory tree. While this is also an issue with the previous Glob.scan({followSymlinks: true}) implementation, consider adding cycle detection by tracking visited real paths (using realpathSync) to prevent potential stack overflow errors. This would make the implementation more robust than the original.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments