Skip to content

Add grep_search and glob_find tools for code search and file discovery#45

Open
karthik-vm wants to merge 1 commit intowalidabualafia:mainfrom
karthik-vm:enhancement/grep-glob-tools-19
Open

Add grep_search and glob_find tools for code search and file discovery#45
karthik-vm wants to merge 1 commit intowalidabualafia:mainfrom
karthik-vm:enhancement/grep-glob-tools-19

Conversation

@karthik-vm
Copy link
Copy Markdown

Summary

  • Adds grep_search tool: regex code search powered by ripgrep (rg), with automatic fallback to grep -rn. Supports optional directory scoping and file-type filtering via glob. Output capped at 200 lines to prevent token explosion.
  • Adds glob_find tool: glob-based file discovery using Node.js recursive readdir with a glob-to-regex matcher. Results sorted by modification time (newest first), capped at 200 entries.
  • Both tools are read-only and execute automatically with no permission prompt (same as read_file).
  • Updated system prompt, tool schema, dispatcher, permissions, tests (113 passing), and documentation.

Fixes #19

Test plan

  • pnpm build passes
  • pnpm lint passes
  • pnpm format:check passes
  • pnpm typecheck passes
  • pnpm test — all 113 tests pass (25 tool tests including 9 new ones for grep_search and glob_find)
  • Manual verification: run caretforge "Find all TODO comments" and confirm grep_search is invoked
  • Manual verification: run caretforge "What test files exist?" and confirm glob_find is invoked

Made with Cursor

Without search tools, the model must guess file paths or ask the user,
making multi-file tasks extremely slow and error-prone. This adds two
read-only tools that execute automatically (no permission prompt needed):

- grep_search: regex code search using ripgrep (falls back to grep)
- glob_find: glob-based file discovery sorted by modification time

Both tools cap output to prevent token explosion.

Fixes walidabualafia#19

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown
Owner

@walidabualafia walidabualafia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Add grep_search and glob_find tools

This is a well-structured PR — all quality gates pass cleanly (build, typecheck, lint, format, 113 tests), the code follows existing patterns, and the documentation is thorough. Nice work.

That said, I found some functional and performance issues that should be addressed before merging.


Issues to Fix

1. glob_find traverses node_modules / .git — extreme performance hit

readdir({ recursive: true }) traverses the entire directory tree. I tested this on the repo itself:

  • 79,482 total entries returned
  • 78,600 (99%) from node_modules
  • ~1.3s just for the directory listing, before any glob matching or stat() calls

On larger projects this will be far worse. Common directories like node_modules, .git, dist, build, and coverage should be excluded from traversal. Alternatively, consider using a .gitignore-aware traversal (e.g., git ls-files as the primary source, with fallback to readdir).

2. globToRegex doesn't support brace expansion {}

The function escapes { and } as literal characters, so patterns like **/*.{ts,tsx} silently match nothing:

*.{js,jsx} → /^[^/]*\.\{js,jsx\}$/   ← matches zero files

This is a common glob syntax that the model will naturally try. It fails silently (no error, just empty results), which is confusing. Either:

  • Add brace expansion support to globToRegex, or
  • Use an established glob library like picomatch (already an indirect dependency via fdir/vitepress)

3. No timeout protection on glob_find

grepSearch has a 15-second timeout via spawn({ timeout }), but glob_find has none. On a large filesystem, readdir({ recursive: true }) + stat() on all matches could hang for a very long time with no way to abort.

4. Unbounded parallel stat() calls in glob_find

All matched entries are stat'd concurrently via Promise.all:

const withMtime = await Promise.all(
  matched.map(async (entry) => { ... stat(fullPath) ... })
);

For a broad pattern like **/*, this could be thousands of concurrent stat() calls, potentially hitting OS file descriptor limits. Consider batching (e.g., p-limit) or stat'ing only the top N matches.

5. Ripgrep fallback silently masks errors

In grepSearch, if ripgrep IS installed but errors (exit code 2+, e.g. invalid regex), the error is silently swallowed and grep is tried instead:

try {
  result = await runCommand('rg', rgArgs, searchDir);
} catch {
  result = await runCommand('grep', grepArgs, searchDir);
}

Consider distinguishing "not installed" (spawn error event) from "command failed" (exit code 2+). Only fall back to grep when ripgrep isn't installed; surface real errors to the user.


Minor

6. --max-count=500 only applies to ripgrep, not to grep fallback

The ripgrep arguments include --max-count=500 (per-file limit), but the grep fallback has no equivalent. This means the grep path could produce much more output for the same query.


What looks good

  • All quality gates pass with zero errors and zero warnings
  • Tool schemas are well-defined with clear descriptions
  • Tests are comprehensive (9 new tests covering both tools)
  • Permission model correctly classifies both tools as read-only/always-allowed
  • System prompt updated to guide the model on when to use each tool
  • Output capping (MAX_RESULTS = 200, MAX_OUTPUT_LINES = 200) prevents token explosion
  • grepSearch properly uses -- to separate pattern from flags, preventing flag injection
  • No secrets or sensitive data in the diff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Grep and Glob tools for code search and file discovery

2 participants