fix: prevent OOM kill by skipping excluded dirs before traversal by jimmy-infinex · Pull Request #196 · tobi/qmd

jimmy-infinex · 2026-02-16T21:28:18Z

Problem

Glob.scan() from Bun traverses into node_modules, .git, and other excluded directories before the post-hoc filter can reject matching files. On large monorepos (e.g. 8+ JavaScript sub-repos with massive node_modules trees), this causes the indexing process to buffer millions of directory entries in memory and get SIGKILL'd by the OS.

Reproducing is straightforward: run qmd collection add on any directory containing a large JS monorepo with **/*.md as the glob pattern.

Fix

Replace Glob.scan() with a manual recursive directory walker (readdirSync + withFileTypes) that checks excludeDirs and hidden directories (.*) before descending into them. Files are still matched against the glob pattern via Glob.match().

This is a drop-in behavioral replacement:

Same files indexed
Same exclusions applied
Symlinks still followed (with broken symlink handling)
Memory usage stays constant regardless of excluded directory size

Testing

All 167 existing tests pass
Manually verified: node_modules/ directory with files is correctly skipped (not traversed), while sibling directories are indexed normally

Bun's Glob.scan() traverses into node_modules and other excluded directories before the post-hoc filter can reject them. On large monorepos (e.g. 8+ sub-repos with massive node_modules), this causes the process to consume gigabytes of memory and get SIGKILL'd by the OS. Replace Glob.scan() with a manual recursive directory walker that checks excludeDirs and hidden directories BEFORE descending. Files are still matched against the glob pattern via Glob.match(). This is a drop-in behavioral replacement — same files indexed, same exclusions applied, but memory usage stays constant regardless of how large the excluded directories are.

Copilot

Pull request overview

This PR fixes an OOM (Out of Memory) issue that occurs when indexing large monorepos. The problem was that Glob.scan() from Bun traverses into excluded directories like node_modules before filtering can reject them, causing memory exhaustion on repositories with millions of files.

Changes:

Replaced Glob.scan() with manual recursive directory traversal using readdirSync that checks exclusion criteria before descending into directories
Added readdirSync import from fs module
Implemented walkDir() helper function with proper symlink handling and early directory exclusion

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-18T12:47:49Z

src/qmd.ts


  progress.indeterminate();
  const glob = new Glob(globPattern);
+  const excludeSet = new Set([...excludeDirs, "node_modules"]);


The excludeSet includes "node_modules" twice: once from the spread of excludeDirs array (which already contains "node_modules" at line 1354) and once as a direct addition. This redundancy should be removed by either removing "node_modules" from the explicit addition here, or removing it from the excludeDirs array definition.

Suggested change

const excludeSet = new Set([...excludeDirs, "node_modules"]);

const excludeSet = new Set(excludeDirs);

Copilot · 2026-02-18T12:47:50Z

src/qmd.ts

+        if (entry.isSymbolicLink()) {
+          try {
+            const target = statSync(resolve(resolvedPwd, dir, name));
+            if (!target.isDirectory()) {
+              // Symlink to a file — check if it matches
+              const relPath = relPrefix ? `${relPrefix}/${name}` : name;
+              if (glob.match(relPath)) files.push(relPath);
+              continue;
+            }
+          } catch {
+            continue; // Broken symlink
+          }
+        }
+        walkDir(dir ? `${dir}/${name}` : name, relPrefix ? `${relPrefix}/${name}` : name);


The symlink handling could lead to infinite recursion if there are circular symlinks in the directory tree. While this is also an issue with the previous Glob.scan({followSymlinks: true}) implementation, consider adding cycle detection by tracking visited real paths (using realpathSync) to prevent potential stack overflow errors. This would make the implementation more robust than the original.

tobi requested a review from Copilot February 18, 2026 12:43

Copilot started reviewing on behalf of tobi February 18, 2026 12:44 View session

Copilot AI reviewed Feb 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent OOM kill by skipping excluded dirs before traversal#196

fix: prevent OOM kill by skipping excluded dirs before traversal#196
jimmy-infinex wants to merge 1 commit intotobi:mainfrom
jimmy-infinex:fix/glob-oom-exclude-dirs

jimmy-infinex commented Feb 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 18, 2026

Uh oh!

Copilot AI Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

	const excludeSet = new Set([...excludeDirs, "node_modules"]);
	const excludeSet = new Set(excludeDirs);

Conversation

jimmy-infinex commented Feb 16, 2026

Problem

Fix

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments