Add grep_search and glob_find tools for code search and file discovery#45
Add grep_search and glob_find tools for code search and file discovery#45karthik-vm wants to merge 1 commit intowalidabualafia:mainfrom
Conversation
Without search tools, the model must guess file paths or ask the user, making multi-file tasks extremely slow and error-prone. This adds two read-only tools that execute automatically (no permission prompt needed): - grep_search: regex code search using ripgrep (falls back to grep) - glob_find: glob-based file discovery sorted by modification time Both tools cap output to prevent token explosion. Fixes walidabualafia#19 Co-authored-by: Cursor <cursoragent@cursor.com>
walidabualafia
left a comment
There was a problem hiding this comment.
Review: Add grep_search and glob_find tools
This is a well-structured PR — all quality gates pass cleanly (build, typecheck, lint, format, 113 tests), the code follows existing patterns, and the documentation is thorough. Nice work.
That said, I found some functional and performance issues that should be addressed before merging.
Issues to Fix
1. glob_find traverses node_modules / .git — extreme performance hit
readdir({ recursive: true }) traverses the entire directory tree. I tested this on the repo itself:
- 79,482 total entries returned
- 78,600 (99%) from
node_modules - ~1.3s just for the directory listing, before any glob matching or
stat()calls
On larger projects this will be far worse. Common directories like node_modules, .git, dist, build, and coverage should be excluded from traversal. Alternatively, consider using a .gitignore-aware traversal (e.g., git ls-files as the primary source, with fallback to readdir).
2. globToRegex doesn't support brace expansion {}
The function escapes { and } as literal characters, so patterns like **/*.{ts,tsx} silently match nothing:
*.{js,jsx} → /^[^/]*\.\{js,jsx\}$/ ← matches zero files
This is a common glob syntax that the model will naturally try. It fails silently (no error, just empty results), which is confusing. Either:
- Add brace expansion support to
globToRegex, or - Use an established glob library like picomatch (already an indirect dependency via
fdir/vitepress)
3. No timeout protection on glob_find
grepSearch has a 15-second timeout via spawn({ timeout }), but glob_find has none. On a large filesystem, readdir({ recursive: true }) + stat() on all matches could hang for a very long time with no way to abort.
4. Unbounded parallel stat() calls in glob_find
All matched entries are stat'd concurrently via Promise.all:
const withMtime = await Promise.all(
matched.map(async (entry) => { ... stat(fullPath) ... })
);For a broad pattern like **/*, this could be thousands of concurrent stat() calls, potentially hitting OS file descriptor limits. Consider batching (e.g., p-limit) or stat'ing only the top N matches.
5. Ripgrep fallback silently masks errors
In grepSearch, if ripgrep IS installed but errors (exit code 2+, e.g. invalid regex), the error is silently swallowed and grep is tried instead:
try {
result = await runCommand('rg', rgArgs, searchDir);
} catch {
result = await runCommand('grep', grepArgs, searchDir);
}Consider distinguishing "not installed" (spawn error event) from "command failed" (exit code 2+). Only fall back to grep when ripgrep isn't installed; surface real errors to the user.
Minor
6. --max-count=500 only applies to ripgrep, not to grep fallback
The ripgrep arguments include --max-count=500 (per-file limit), but the grep fallback has no equivalent. This means the grep path could produce much more output for the same query.
What looks good
- All quality gates pass with zero errors and zero warnings
- Tool schemas are well-defined with clear descriptions
- Tests are comprehensive (9 new tests covering both tools)
- Permission model correctly classifies both tools as read-only/always-allowed
- System prompt updated to guide the model on when to use each tool
- Output capping (
MAX_RESULTS = 200,MAX_OUTPUT_LINES = 200) prevents token explosion grepSearchproperly uses--to separate pattern from flags, preventing flag injection- No secrets or sensitive data in the diff
Summary
grep_searchtool: regex code search powered by ripgrep (rg), with automatic fallback togrep -rn. Supports optional directory scoping and file-type filtering via glob. Output capped at 200 lines to prevent token explosion.glob_findtool: glob-based file discovery using Node.js recursivereaddirwith a glob-to-regex matcher. Results sorted by modification time (newest first), capped at 200 entries.read_file).Fixes #19
Test plan
pnpm buildpassespnpm lintpassespnpm format:checkpassespnpm typecheckpassespnpm test— all 113 tests pass (25 tool tests including 9 new ones for grep_search and glob_find)caretforge "Find all TODO comments"and confirm grep_search is invokedcaretforge "What test files exist?"and confirm glob_find is invokedMade with Cursor