feat: Add extract_code tool to Agent MCP server #338

buger · 2026-01-18T20:21:09Z

Summary

Adds extract_code tool to the probe-agent MCP server (probe agent --mcp)
Complements the existing search_code tool by allowing AI assistants to retrieve complete code blocks
Uses the same underlying extract function as the low-level MCP server with identical options (XML format, test files included)

Changes

Added import for extract function from main package
Added extract_code tool definition with schema for path and files parameters
Added handler that validates inputs and calls the extract function
Updated error logging to be dynamic based on tool name

Test plan

Verify probe agent --mcp starts successfully
Test extract_code tool with valid file paths
Test extract_code tool with symbol extraction (e.g., file.js#funcName)
Test extract_code tool with line ranges (e.g., file.js:10-20)
Verify error handling for missing required parameters

🤖 Generated with Claude Code

Add extract_code tool to the probe-agent MCP server to complement the existing search_code tool. This allows AI assistants to retrieve complete code blocks based on file paths and symbols returned by search_code. The tool uses the same underlying extract function as the low-level MCP server, with identical options (XML format, test files included). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

probelabs · 2026-01-18T20:21:57Z

PR Overview: Add extract_code Tool to Agent MCP Server

Summary

This PR adds the extract_code tool to the Probe Agent MCP server (probe agent --mcp), complementing the existing search_code tool. The new tool enables AI assistants to retrieve complete code blocks from files using tree-sitter AST parsing.

Files Changed

npm/src/agent/index.js (+58 lines, -2 lines)

Added import for extract function from main package
Added extract_code tool definition with schema
Implemented tool handler with validation
Updated error logging to be dynamic based on tool name

Architecture & Impact Assessment

What This PR Accomplishes

New MCP Tool: Adds extract_code tool to the Agent MCP server, allowing AI assistants to extract complete code content from files
Complementary Functionality: Works alongside search_code - search finds relevant code, extract retrieves full content
Consistent Interface: Uses the same underlying extract function as the low-level MCP server with identical options

Key Technical Changes

Tool Definition (lines 406-424):
- Name: extract_code
- Parameters: path (project root), files (array of file paths)
- Supports multiple file formats: full files, line ranges, symbol extraction
- Example: "file.js", "file.js:42", "file.js:10-20", "file.js#funcName"
Handler Implementation (lines 432-467):
- Validates required parameters (path, files array)
- Builds options with smart defaults (XML format, include test files)
- Calls the extract function from main package
- Returns results in MCP text format
Error Handling (line 613):
- Dynamic error logging based on tool name
- Maintains consistency with existing error handling pattern

Affected System Components

graph TD
    A[AI Assistant] -->|MCP Call| B[Probe Agent MCP Server]
    B -->|search_code| C[ProbeAgent.answer]
    B -->|extract_code| D[extract function]
    D -->|AST Parsing| E[probe binary]
    E -->|Code Blocks| D
    D -->|XML Format| B
    B -->|Text Response| A
    
    style D fill:#90EE90
    style B fill:#87CEEB

Component Flow:

AI Assistant calls extract_code via MCP protocol
ProbeAgentMcpServer validates and processes request
Calls extract() function from main package
Extract function invokes probe binary with AST parsing
Returns formatted code blocks to AI assistant

Integration Points

MCP Server: ProbeAgentMcpServer class now handles 2 tools (search_code, extract_code)
Extract Module: Reuses existing extract function from npm/src/extract.js
Probe Binary: Leverages tree-sitter AST parsing via probe CLI

Scope Discovery & Context Expansion

Direct Impact

Agent MCP Server: Single file change (npm/src/agent/index.js)
Tool Registry: New tool added to MCP tools list
AI Agent Workflows: Enables two-phase workflow (search → extract)

Related Files (Context)

Based on code analysis, these files are related but unchanged:

npm/src/extract.js: Core extract function implementation
npm/src/agent/ProbeAgent.js: Main agent class with tool management
npm/src/agent/mcp/built-in-server.js: Built-in MCP server with extract tool
npm/src/mcp/index.ts: Low-level MCP server (already has extract_code)

Potential Follow-up Areas

Testing: No test changes included - consider adding MCP server tests
Documentation: Help text may need update to mention new tool
TypeScript Types: ProbeAgent.d.ts may need updates if types are exported

Technical Details

Tool Schema

{
  name: 'extract_code',
  description: 'Extract full code blocks from files using tree-sitter AST parsing...',
  inputSchema: {
    type: 'object',
    properties: {
      path: { type: 'string', description: 'Absolute path to project root' },
      files: { 
        type: 'array', 
        items: { type: 'string' },
        description: 'File paths with optional line numbers or symbols'
      }
    },
    required: ['path', 'files']
  }
}

Default Options

format: 'xml' - Consistent with other probe tools
allowTests: true - Include test files by default
Supports absolute and relative file paths

Error Handling

Validates path parameter exists
Validates files array is non-empty
Returns error messages in MCP text format
Dynamic logging: Error executing ${toolName}:

Review Notes

Complexity: Low

Single file change with clear, focused implementation
Reuses existing, well-tested extract function
Follows established patterns from search_code tool

Testing Recommendations

Verify MCP server starts successfully
Test extract_code with valid file paths
Test symbol extraction (e.g., file.js#funcName)
Test line ranges (e.g., file.js:10-20)
Verify error handling for missing parameters
Test integration with AI assistant workflows

Code Quality

✅ Follows existing code patterns
✅ Proper parameter validation
✅ Clear error messages
✅ Debug logging support
✅ Consistent with MCP protocol standards

Metadata

Review Effort: 2 / 5
Primary Label: feature

Powered by Visor from Probelabs

Last updated: 2026-01-18T20:22:55.785Z | Triggered by: pr_opened | Commit: 74c2df7

💡 TIP: You can chat with Visor using /visor ask <your question>

probelabs · 2026-01-18T20:22:23Z

Security Issues (4)

Severity	Location	Issue
🔴 Critical	`npm/src/agent/index.js:432-455`	The extract_code tool lacks access control restrictions present in ProbeAgent. While the underlying extract() function validates the working directory path exists, it does NOT restrict which files can be accessed. The MCP server accepts any absolute path and any file paths without validating they are within allowed directories. This allows unauthorized access to arbitrary files on the system. 💡 Suggestion Implement allowedFolders validation similar to ProbeAgent (lines 1288-1320 in ProbeAgent.js). Before calling extract(), validate that all file paths in the files array are within the allowed directories. If no allowed_folders are specified, default to the current working directory. 🔧 Suggested Fix // Handle extract_code tool if (request.params.name === 'extract_code') { // Validate required parameters if (!args.path) { throw new Error("Path is required"); } if (!args.files \|\| !Array.isArray(args.files) \|\| args.files.length === 0) { throw new Error("Files array is required and must not be empty"); } // Security: Validate all file paths are within allowed directories const allowedDirs = args.allowed_folders && args.allowed_folders.length > 0 ? args.allowed_folders : [args.path \|\| process.cwd()]; const { resolve, normalize, sep } = await import('path'); for (const file of args.files) { // Resolve the file path const resolvedPath = file.startsWith('/') \|\| file.match(/^[A-Za-z]:/) ? normalize(resolve(file)) : normalize(resolve(args.path, file)); // Check if path is within any allowed directory const isAllowed = allowedDirs.some(dir => { const normalizedDir = normalize(resolve(dir)); return resolvedPath === normalizedDir \|\| resolvedPath.startsWith(normalizedDir + sep); }); if (!isAllowed) { throw new Error(`Path traversal attempt detected. Cannot access file outside allowed directories: ${file}`); } } // Build options with smart defaults const options = { files: args.files, path: args.path, format: 'xml', allowTests: true, // Include test files by default };</code></pre>
🟠 Error	`npm/src/agent/index.js:405-415`	The extract_code tool schema does not include allowed_folders parameter, preventing users from specifying access restrictions. This is inconsistent with the search_code tool which supports allowed_folders. 💡 Suggestion Add allowed_folders parameter to the extract_code tool schema to allow users to specify which directories can be accessed. 🔧 Suggested Fix { name: 'extract_code', description: "Extract full code blocks from files using tree-sitter AST parsing. Use this to get complete code content based on file paths and symbols returned by search_code. Each file path can include optional line numbers or symbol names to extract specific code blocks.", inputSchema: { type: 'object', properties: { path: { type: 'string', description: 'Absolute path to the project root directory (used as working directory for relative file paths).', }, files: { type: 'array', items: { type: 'string' }, description: 'Array of file paths to extract from. Formats: "file.js" (entire file), "file.js:42" (code block at line 42), "file.js:10-20" (lines 10-20), "file.js#funcName" (specific symbol). Line numbers and symbols are part of the path string, not separate parameters. Paths can be absolute or relative to the project directory.', }, allowed_folders: { type: 'array', items: { type: 'string' }, description: 'Optional list of allowed directories for file operations. Defaults to the path directory if not specified.', } }, required: ['path', 'files'], }, },
🟡 Warning	`npm/src/agent/index.js:451-452`	Debug logging exposes the full options object including file paths to stderr when DEBUG=1 is set. This could leak sensitive file paths or project structure in logs. 💡 Suggestion Sanitize file paths in debug output or make debug logging more selective. Consider redacting sensitive paths or only logging the count of files. 🔧 Suggested Fix `if (process.env.DEBUG === '1') { console.error('[DEBUG] Executing extract_code with options:', JSON.stringify({ path: options.path, fileCount: options.files.length, format: options.format, allowTests: options.allowTests }, null, 2)); }`
🟡 Warning	`npm/src/extract.js:71-73`	File paths are escaped using escapeString() but then passed to exec() which spawns a shell. While escapeString() uses proper POSIX shell escaping (single quotes with embedded quote handling), using exec() with shell escaping is inherently riskier than using execFile() with argument arrays. The grep.js module demonstrates the safer approach using execFile(). 💡 Suggestion Consider refactoring extract() to use execFile() instead of exec() for better security. The search.js and grep.js modules already use execFile() which avoids shell interpretation entirely. 🔧 Suggested Fix // Otherwise use execFile for secure command execution (no shell) const args = ['extract', ...cliArgs]; const { execFile } = await import('child_process'); const execFileAsync = promisify(execFile); try { const { stdout, stderr } = await execFileAsync(binaryPath, args, { cwd }); if (stderr) { console.error(`stderr: ${stderr}`); } return processExtractOutput(stdout, options);</code></pre>

Architecture Issues (5)

Severity	Location	Issue
🟢 Info	`npm/src/agent/index.js:432-466`	extract_code is a thin wrapper around the extract function with hardcoded defaults (format: 'xml', allowTests: true). This creates an inconsistent abstraction layer where some tools have complex logic (search_code with agent initialization) while others are simple pass-throughs. 💡 Suggestion Consider whether extract_code should expose more configuration options or if it should be generalized. The hardcoded defaults may not suit all use cases.
🟢 Info	`npm/src/agent/index.js:610`	Error logging was updated to use dynamic tool name (request.params.name), which is good. However, this change was made necessary by the addition of extract_code and reveals that the previous hardcoded 'search_code' was a special case that should have been generalized earlier. 💡 Suggestion This is a good fix that addresses a previous special case. No action needed, but note that this indicates the original design had unnecessary hardcoding.
🟡 Warning	`npm/src/agent/index.js:421-427`	Tool routing uses if-else chain that will require modification for each new tool. This pattern doesn't scale well as more tools are added. Consider using a tool registry or handler map for better extensibility. 💡 Suggestion Replace the if-else chain with a tool handler registry: const toolHandlers = { search_code: handleSearchCode, extract_code: handleExtractCode }; const handler = toolHandlers[request.params.name]; if (handler) return handler(args);
🟡 Warning	`npm/src/agent/index.js:432-466`	The extract_code tool handler duplicates the error handling pattern from search_code. Both tools wrap execution in try-catch with similar error logging. This violates DRY and creates maintenance burden. 💡 Suggestion Extract common error handling logic into a shared wrapper function: async function executeTool(toolName, handler, args) { try { return await handler(args); } catch (error) { console.error(`Error executing ${toolName}:`, error); return { content: [{ type: 'text', text: `Error: ${error.message}` }]}; } }
🟡 Warning	`npm/src/agent/index.js:432-466`	Hardcoded format: 'xml' and allowTests: true are special case assumptions. The low-level MCP server exposes these as configurable options, but this wrapper forces specific values, reducing flexibility. 💡 Suggestion Expose format and allowTests as optional parameters in the inputSchema with sensible defaults, allowing users to override when needed: properties: { format: { type: 'string', enum: ['xml', 'json', 'markdown'], default: 'xml' }, allowTests: { type: 'boolean', default: true } }

✅ Performance Check Passed

No performance issues found – changes LGTM.

Quality Issues (5)

Severity	Location	Issue
🟠 Error	`npm/src/agent/index.js:432`	No tests found for the new extract_code tool in the ProbeAgentMcpServer class. The PR adds a new tool without corresponding test coverage for validation, error handling, or integration scenarios. 💡 Suggestion Add test cases covering: (1) Valid extract_code calls with various file path formats, (2) Missing required parameter validation, (3) Empty files array validation, (4) Error handling when extract function fails, (5) Integration with the MCP server framework. Tests should verify the tool correctly validates inputs and calls the extract function with proper options.
🟢 Info	`npm/src/agent/index.js:399`	The extract_code tool description mentions 'tree-sitter AST parsing' but doesn't explain the difference between this and the low-level MCP server's extract tool, or when to use one versus the other. Users may be confused about which tool to use. 💡 Suggestion Clarify in the description that this is the high-level agent MCP server (probe agent --mcp) vs the low-level code search MCP server. Explain that extract_code is for use with the agent server when you want the AI to have access to code extraction capabilities, while the low-level extract tool is for direct programmatic access.
🟡 Warning	`npm/src/agent/index.js:432`	The extract_code handler lacks error handling for the extract function call. If extract() throws an error, it will propagate to the MCP server layer without proper error formatting or context. 💡 Suggestion Wrap the extract() call in a try-catch block and return a properly formatted error response consistent with MCP error handling patterns. Consider adding error context like the file paths being extracted and the working directory. 🔧 Suggested Fix try { const result = await extract(options); return { content: [{ type: 'text', text: result }], }; } catch (error) { console.error(`Error executing extract_code:`, error); return { content: [{ type: 'text', text: `Error extracting code: ${error.message}` }], isError: true }; }
🟡 Warning	`npm/src/agent/index.js:432`	Hardcoded default values (format: 'xml', allowTests: true) are embedded in the handler code. These should be configurable or documented constants to maintain consistency with the low-level MCP server implementation. 💡 Suggestion Extract these defaults to named constants at the top of the class or file, or reference the same constants used by the low-level MCP server to ensure consistency. Consider making these configurable via tool parameters if users might want different behavior. 🔧 Suggested Fix `// At class level or top of file: const EXTRACT_DEFAULTS = { FORMAT: 'xml', ALLOW_TESTS: true };` `// In handler: const options = { files: args.files, path: args.path, format: EXTRACT_DEFAULTS.FORMAT, allowTests: EXTRACT_DEFAULTS.ALLOW_TESTS };`
🟡 Warning	`npm/src/agent/index.js:432`	Input validation only checks for presence of 'path' and non-empty 'files' array, but doesn't validate that file paths are strings or that the path is a valid directory path. Invalid inputs could cause cryptic errors from the extract function. 💡 Suggestion Add more robust validation: (1) Verify args.files contains only strings, (2) Check that args.path is a non-empty string, (3) Optionally validate that the path exists or is a valid absolute path format. This provides clearer error messages to users. 🔧 Suggested Fix `if (!args.path \|\| typeof args.path !== 'string' \|\| args.path.trim().length === 0) { throw new Error("Path is required and must be a non-empty string"); } if (!args.files \|\| !Array.isArray(args.files) \|\| args.files.length === 0) { throw new Error("Files array is required and must not be empty"); } if (!args.files.every(f => typeof f === 'string' && f.trim().length > 0)) { throw new Error("All file paths must be non-empty strings"); }`

Powered by Visor from Probelabs

Last updated: 2026-01-18T20:22:58.454Z | Triggered by: pr_opened | Commit: 74c2df7

💡 TIP: You can chat with Visor using /visor ask <your question>

buger merged commit 0be9add into main Jan 19, 2026
16 of 18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add extract_code tool to Agent MCP server #338

feat: Add extract_code tool to Agent MCP server #338

Uh oh!

buger commented Jan 18, 2026

Uh oh!

probelabs bot commented Jan 18, 2026 •

edited

Loading

Uh oh!

probelabs bot commented Jan 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Add extract_code tool to Agent MCP server #338

feat: Add extract_code tool to Agent MCP server #338

Uh oh!

Conversation

buger commented Jan 18, 2026

Summary

Changes

Test plan

Uh oh!

probelabs bot commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Overview: Add extract_code Tool to Agent MCP Server

Summary

Files Changed

Architecture & Impact Assessment

What This PR Accomplishes

Key Technical Changes

Affected System Components

Integration Points

Scope Discovery & Context Expansion

Direct Impact

Related Files (Context)

Potential Follow-up Areas

Technical Details

Tool Schema

Default Options

Error Handling

Review Notes

Complexity: Low

Testing Recommendations

Code Quality

Uh oh!

probelabs bot commented Jan 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Security Issues (4)

Architecture Issues (5)

✅ Performance Check Passed

Quality Issues (5)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

probelabs bot commented Jan 18, 2026 •

edited

Loading

probelabs bot commented Jan 18, 2026 •

edited

Loading