Add grep_search and glob_find tools for code search and file discovery by karthik-vm · Pull Request #45 · walidabualafia/caretforge

karthik-vm · 2026-02-18T21:24:53Z

Summary

Adds grep_search tool: regex code search powered by ripgrep (rg), with automatic fallback to grep -rn. Supports optional directory scoping and file-type filtering via glob. Output capped at 200 lines to prevent token explosion.
Adds glob_find tool: glob-based file discovery using Node.js recursive readdir with a glob-to-regex matcher. Results sorted by modification time (newest first), capped at 200 entries.
Both tools are read-only and execute automatically with no permission prompt (same as read_file).
Updated system prompt, tool schema, dispatcher, permissions, tests (113 passing), and documentation.

Fixes #19

Test plan

pnpm build passes
pnpm lint passes
pnpm format:check passes
pnpm typecheck passes
pnpm test — all 113 tests pass (25 tool tests including 9 new ones for grep_search and glob_find)
Manual verification: run caretforge "Find all TODO comments" and confirm grep_search is invoked
Manual verification: run caretforge "What test files exist?" and confirm glob_find is invoked

Made with Cursor

Without search tools, the model must guess file paths or ask the user, making multi-file tasks extremely slow and error-prone. This adds two read-only tools that execute automatically (no permission prompt needed): - grep_search: regex code search using ripgrep (falls back to grep) - glob_find: glob-based file discovery sorted by modification time Both tools cap output to prevent token explosion. Fixes walidabualafia#19 Co-authored-by: Cursor <cursoragent@cursor.com>

walidabualafia

Review: Add grep_search and glob_find tools

This is a well-structured PR — all quality gates pass cleanly (build, typecheck, lint, format, 113 tests), the code follows existing patterns, and the documentation is thorough. Nice work.

That said, I found some functional and performance issues that should be addressed before merging.

Issues to Fix

1. glob_find traverses node_modules / .git — extreme performance hit

readdir({ recursive: true }) traverses the entire directory tree. I tested this on the repo itself:

79,482 total entries returned
78,600 (99%) from node_modules
~1.3s just for the directory listing, before any glob matching or stat() calls

On larger projects this will be far worse. Common directories like node_modules, .git, dist, build, and coverage should be excluded from traversal. Alternatively, consider using a .gitignore-aware traversal (e.g., git ls-files as the primary source, with fallback to readdir).

2. globToRegex doesn't support brace expansion {}

The function escapes { and } as literal characters, so patterns like **/*.{ts,tsx} silently match nothing:

*.{js,jsx} → /^[^/]*\.\{js,jsx\}$/   ← matches zero files

This is a common glob syntax that the model will naturally try. It fails silently (no error, just empty results), which is confusing. Either:

Add brace expansion support to globToRegex, or
Use an established glob library like picomatch (already an indirect dependency via fdir/vitepress)

3. No timeout protection on glob_find

grepSearch has a 15-second timeout via spawn({ timeout }), but glob_find has none. On a large filesystem, readdir({ recursive: true }) + stat() on all matches could hang for a very long time with no way to abort.

4. Unbounded parallel stat() calls in glob_find

All matched entries are stat'd concurrently via Promise.all:

const withMtime = await Promise.all(
  matched.map(async (entry) => { ... stat(fullPath) ... })
);

For a broad pattern like **/*, this could be thousands of concurrent stat() calls, potentially hitting OS file descriptor limits. Consider batching (e.g., p-limit) or stat'ing only the top N matches.

5. Ripgrep fallback silently masks errors

In grepSearch, if ripgrep IS installed but errors (exit code 2+, e.g. invalid regex), the error is silently swallowed and grep is tried instead:

try {
  result = await runCommand('rg', rgArgs, searchDir);
} catch {
  result = await runCommand('grep', grepArgs, searchDir);
}

Consider distinguishing "not installed" (spawn error event) from "command failed" (exit code 2+). Only fall back to grep when ripgrep isn't installed; surface real errors to the user.

Minor

6. --max-count=500 only applies to ripgrep, not to grep fallback

The ripgrep arguments include --max-count=500 (per-file limit), but the grep fallback has no equivalent. This means the grep path could produce much more output for the same query.

What looks good

All quality gates pass with zero errors and zero warnings
Tool schemas are well-defined with clear descriptions
Tests are comprehensive (9 new tests covering both tools)
Permission model correctly classifies both tools as read-only/always-allowed
System prompt updated to guide the model on when to use each tool
Output capping (MAX_RESULTS = 200, MAX_OUTPUT_LINES = 200) prevents token explosion
grepSearch properly uses -- to separate pattern from flags, preventing flag injection
No secrets or sensitive data in the diff

walidabualafia requested changes Feb 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add grep_search and glob_find tools for code search and file discovery#45

Add grep_search and glob_find tools for code search and file discovery#45
karthik-vm wants to merge 1 commit intowalidabualafia:mainfrom
karthik-vm:enhancement/grep-glob-tools-19

karthik-vm commented Feb 18, 2026

Uh oh!

walidabualafia left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

karthik-vm commented Feb 18, 2026

Summary

Test plan

Uh oh!

walidabualafia left a comment

Choose a reason for hiding this comment

Review: Add grep_search and glob_find tools

Issues to Fix

Minor

What looks good

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants