Skip to content

Conversation

@buger
Copy link
Collaborator

@buger buger commented Jan 18, 2026

Summary

  • Adds extract_code tool to the probe-agent MCP server (probe agent --mcp)
  • Complements the existing search_code tool by allowing AI assistants to retrieve complete code blocks
  • Uses the same underlying extract function as the low-level MCP server with identical options (XML format, test files included)

Changes

  • Added import for extract function from main package
  • Added extract_code tool definition with schema for path and files parameters
  • Added handler that validates inputs and calls the extract function
  • Updated error logging to be dynamic based on tool name

Test plan

  • Verify probe agent --mcp starts successfully
  • Test extract_code tool with valid file paths
  • Test extract_code tool with symbol extraction (e.g., file.js#funcName)
  • Test extract_code tool with line ranges (e.g., file.js:10-20)
  • Verify error handling for missing required parameters

🤖 Generated with Claude Code

Add extract_code tool to the probe-agent MCP server to complement the
existing search_code tool. This allows AI assistants to retrieve complete
code blocks based on file paths and symbols returned by search_code.

The tool uses the same underlying extract function as the low-level MCP
server, with identical options (XML format, test files included).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@probelabs
Copy link
Contributor

probelabs bot commented Jan 18, 2026

PR Overview: Add extract_code Tool to Agent MCP Server

Summary

This PR adds the extract_code tool to the Probe Agent MCP server (probe agent --mcp), complementing the existing search_code tool. The new tool enables AI assistants to retrieve complete code blocks from files using tree-sitter AST parsing.

Files Changed

npm/src/agent/index.js (+58 lines, -2 lines)

  • Added import for extract function from main package
  • Added extract_code tool definition with schema
  • Implemented tool handler with validation
  • Updated error logging to be dynamic based on tool name

Architecture & Impact Assessment

What This PR Accomplishes

  1. New MCP Tool: Adds extract_code tool to the Agent MCP server, allowing AI assistants to extract complete code content from files
  2. Complementary Functionality: Works alongside search_code - search finds relevant code, extract retrieves full content
  3. Consistent Interface: Uses the same underlying extract function as the low-level MCP server with identical options

Key Technical Changes

  1. Tool Definition (lines 406-424):

    • Name: extract_code
    • Parameters: path (project root), files (array of file paths)
    • Supports multiple file formats: full files, line ranges, symbol extraction
    • Example: "file.js", "file.js:42", "file.js:10-20", "file.js#funcName"
  2. Handler Implementation (lines 432-467):

    • Validates required parameters (path, files array)
    • Builds options with smart defaults (XML format, include test files)
    • Calls the extract function from main package
    • Returns results in MCP text format
  3. Error Handling (line 613):

    • Dynamic error logging based on tool name
    • Maintains consistency with existing error handling pattern

Affected System Components

graph TD
    A[AI Assistant] -->|MCP Call| B[Probe Agent MCP Server]
    B -->|search_code| C[ProbeAgent.answer]
    B -->|extract_code| D[extract function]
    D -->|AST Parsing| E[probe binary]
    E -->|Code Blocks| D
    D -->|XML Format| B
    B -->|Text Response| A
    
    style D fill:#90EE90
    style B fill:#87CEEB
Loading

Component Flow:

  1. AI Assistant calls extract_code via MCP protocol
  2. ProbeAgentMcpServer validates and processes request
  3. Calls extract() function from main package
  4. Extract function invokes probe binary with AST parsing
  5. Returns formatted code blocks to AI assistant

Integration Points

  • MCP Server: ProbeAgentMcpServer class now handles 2 tools (search_code, extract_code)
  • Extract Module: Reuses existing extract function from npm/src/extract.js
  • Probe Binary: Leverages tree-sitter AST parsing via probe CLI

Scope Discovery & Context Expansion

Direct Impact

  • Agent MCP Server: Single file change (npm/src/agent/index.js)
  • Tool Registry: New tool added to MCP tools list
  • AI Agent Workflows: Enables two-phase workflow (search → extract)

Related Files (Context)

Based on code analysis, these files are related but unchanged:

  1. npm/src/extract.js: Core extract function implementation
  2. npm/src/agent/ProbeAgent.js: Main agent class with tool management
  3. npm/src/agent/mcp/built-in-server.js: Built-in MCP server with extract tool
  4. npm/src/mcp/index.ts: Low-level MCP server (already has extract_code)

Potential Follow-up Areas

  1. Testing: No test changes included - consider adding MCP server tests
  2. Documentation: Help text may need update to mention new tool
  3. TypeScript Types: ProbeAgent.d.ts may need updates if types are exported

Technical Details

Tool Schema

{
  name: 'extract_code',
  description: 'Extract full code blocks from files using tree-sitter AST parsing...',
  inputSchema: {
    type: 'object',
    properties: {
      path: { type: 'string', description: 'Absolute path to project root' },
      files: { 
        type: 'array', 
        items: { type: 'string' },
        description: 'File paths with optional line numbers or symbols'
      }
    },
    required: ['path', 'files']
  }
}

Default Options

  • format: 'xml' - Consistent with other probe tools
  • allowTests: true - Include test files by default
  • Supports absolute and relative file paths

Error Handling

  • Validates path parameter exists
  • Validates files array is non-empty
  • Returns error messages in MCP text format
  • Dynamic logging: Error executing ${toolName}:

Review Notes

Complexity: Low

  • Single file change with clear, focused implementation
  • Reuses existing, well-tested extract function
  • Follows established patterns from search_code tool

Testing Recommendations

  1. Verify MCP server starts successfully
  2. Test extract_code with valid file paths
  3. Test symbol extraction (e.g., file.js#funcName)
  4. Test line ranges (e.g., file.js:10-20)
  5. Verify error handling for missing parameters
  6. Test integration with AI assistant workflows

Code Quality

  • ✅ Follows existing code patterns
  • ✅ Proper parameter validation
  • ✅ Clear error messages
  • ✅ Debug logging support
  • ✅ Consistent with MCP protocol standards
Metadata
  • Review Effort: 2 / 5
  • Primary Label: feature

Powered by Visor from Probelabs

Last updated: 2026-01-18T20:22:55.785Z | Triggered by: pr_opened | Commit: 74c2df7

💡 TIP: You can chat with Visor using /visor ask <your question>

@probelabs
Copy link
Contributor

probelabs bot commented Jan 18, 2026

Security Issues (4)

Severity Location Issue
🔴 Critical npm/src/agent/index.js:432-455
The extract_code tool lacks access control restrictions present in ProbeAgent. While the underlying extract() function validates the working directory path exists, it does NOT restrict which files can be accessed. The MCP server accepts any absolute path and any file paths without validating they are within allowed directories. This allows unauthorized access to arbitrary files on the system.
💡 SuggestionImplement allowedFolders validation similar to ProbeAgent (lines 1288-1320 in ProbeAgent.js). Before calling extract(), validate that all file paths in the files array are within the allowed directories. If no allowed_folders are specified, default to the current working directory.
🔧 Suggested Fix
        // Handle extract_code tool
        if (request.params.name === 'extract_code') {
          // Validate required parameters
          if (!args.path) {
            throw new Error("Path is required");
          }
          if (!args.files || !Array.isArray(args.files) || args.files.length === 0) {
            throw new Error("Files array is required and must not be empty");
          }
      // Security: Validate all file paths are within allowed directories
      const allowedDirs = args.allowed_folders &amp;&amp; args.allowed_folders.length &gt; 0 
        ? args.allowed_folders 
        : [args.path || process.cwd()];

      const { resolve, normalize, sep } = await import(&#39;path&#39;);
      for (const file of args.files) {
        // Resolve the file path
        const resolvedPath = file.startsWith(&#39;/&#39;) || file.match(/^[A-Za-z]:/)
          ? normalize(resolve(file))
          : normalize(resolve(args.path, file));

        // Check if path is within any allowed directory
        const isAllowed = allowedDirs.some(dir =&gt; {
          const normalizedDir = normalize(resolve(dir));
          return resolvedPath === normalizedDir || resolvedPath.startsWith(normalizedDir + sep);
        });

        if (!isAllowed) {
          throw new Error(`Path traversal attempt detected. Cannot access file outside allowed directories: ${file}`);
        }
      }

      // Build options with smart defaults
      const options = {
        files: args.files,
        path: args.path,
        format: &#39;xml&#39;,
        allowTests: true,  // Include test files by default
      };</code></pre>
🟠 Error npm/src/agent/index.js:405-415
The extract_code tool schema does not include allowed_folders parameter, preventing users from specifying access restrictions. This is inconsistent with the search_code tool which supports allowed_folders.
💡 SuggestionAdd allowed_folders parameter to the extract_code tool schema to allow users to specify which directories can be accessed.
🔧 Suggested Fix
        {
          name: 'extract_code',
          description: "Extract full code blocks from files using tree-sitter AST parsing. Use this to get complete code content based on file paths and symbols returned by search_code. Each file path can include optional line numbers or symbol names to extract specific code blocks.",
          inputSchema: {
            type: 'object',
            properties: {
              path: {
                type: 'string',
                description: 'Absolute path to the project root directory (used as working directory for relative file paths).',
              },
              files: {
                type: 'array',
                items: { type: 'string' },
                description: 'Array of file paths to extract from. Formats: "file.js" (entire file), "file.js:42" (code block at line 42), "file.js:10-20" (lines 10-20), "file.js#funcName" (specific symbol). Line numbers and symbols are part of the path string, not separate parameters. Paths can be absolute or relative to the project directory.',
              },
              allowed_folders: {
                type: 'array',
                items: { type: 'string' },
                description: 'Optional list of allowed directories for file operations. Defaults to the path directory if not specified.',
              }
            },
            required: ['path', 'files'],
          },
        },
🟡 Warning npm/src/agent/index.js:451-452
Debug logging exposes the full options object including file paths to stderr when DEBUG=1 is set. This could leak sensitive file paths or project structure in logs.
💡 SuggestionSanitize file paths in debug output or make debug logging more selective. Consider redacting sensitive paths or only logging the count of files.
🔧 Suggested Fix
          if (process.env.DEBUG === '1') {
            console.error('[DEBUG] Executing extract_code with options:', JSON.stringify({
              path: options.path,
              fileCount: options.files.length,
              format: options.format,
              allowTests: options.allowTests
            }, null, 2));
          }
🟡 Warning npm/src/extract.js:71-73
File paths are escaped using escapeString() but then passed to exec() which spawns a shell. While escapeString() uses proper POSIX shell escaping (single quotes with embedded quote handling), using exec() with shell escaping is inherently riskier than using execFile() with argument arrays. The grep.js module demonstrates the safer approach using execFile().
💡 SuggestionConsider refactoring extract() to use execFile() instead of exec() for better security. The search.js and grep.js modules already use execFile() which avoids shell interpretation entirely.
🔧 Suggested Fix
    // Otherwise use execFile for secure command execution (no shell)
    const args = ['extract', ...cliArgs];
    const { execFile } = await import('child_process');
    const execFileAsync = promisify(execFile);
try {
  const { stdout, stderr } = await execFileAsync(binaryPath, args, { cwd });

  if (stderr) {
    console.error(`stderr: ${stderr}`);
  }

  return processExtractOutput(stdout, options);</code></pre>

Architecture Issues (5)

Severity Location Issue
🟢 Info npm/src/agent/index.js:432-466
extract_code is a thin wrapper around the extract function with hardcoded defaults (format: 'xml', allowTests: true). This creates an inconsistent abstraction layer where some tools have complex logic (search_code with agent initialization) while others are simple pass-throughs.
💡 SuggestionConsider whether extract_code should expose more configuration options or if it should be generalized. The hardcoded defaults may not suit all use cases.
🟢 Info npm/src/agent/index.js:610
Error logging was updated to use dynamic tool name (request.params.name), which is good. However, this change was made necessary by the addition of extract_code and reveals that the previous hardcoded 'search_code' was a special case that should have been generalized earlier.
💡 SuggestionThis is a good fix that addresses a previous special case. No action needed, but note that this indicates the original design had unnecessary hardcoding.
🟡 Warning npm/src/agent/index.js:421-427
Tool routing uses if-else chain that will require modification for each new tool. This pattern doesn't scale well as more tools are added. Consider using a tool registry or handler map for better extensibility.
💡 SuggestionReplace the if-else chain with a tool handler registry: const toolHandlers = { search_code: handleSearchCode, extract_code: handleExtractCode }; const handler = toolHandlers[request.params.name]; if (handler) return handler(args);
🟡 Warning npm/src/agent/index.js:432-466
The extract_code tool handler duplicates the error handling pattern from search_code. Both tools wrap execution in try-catch with similar error logging. This violates DRY and creates maintenance burden.
💡 SuggestionExtract common error handling logic into a shared wrapper function: async function executeTool(toolName, handler, args) { try { return await handler(args); } catch (error) { console.error(`Error executing ${toolName}:`, error); return { content: [{ type: 'text', text: `Error: ${error.message}` }]}; } }
🟡 Warning npm/src/agent/index.js:432-466
Hardcoded format: 'xml' and allowTests: true are special case assumptions. The low-level MCP server exposes these as configurable options, but this wrapper forces specific values, reducing flexibility.
💡 SuggestionExpose format and allowTests as optional parameters in the inputSchema with sensible defaults, allowing users to override when needed: properties: { format: { type: 'string', enum: ['xml', 'json', 'markdown'], default: 'xml' }, allowTests: { type: 'boolean', default: true } }

✅ Performance Check Passed

No performance issues found – changes LGTM.

Quality Issues (5)

Severity Location Issue
🟠 Error npm/src/agent/index.js:432
No tests found for the new extract_code tool in the ProbeAgentMcpServer class. The PR adds a new tool without corresponding test coverage for validation, error handling, or integration scenarios.
💡 SuggestionAdd test cases covering: (1) Valid extract_code calls with various file path formats, (2) Missing required parameter validation, (3) Empty files array validation, (4) Error handling when extract function fails, (5) Integration with the MCP server framework. Tests should verify the tool correctly validates inputs and calls the extract function with proper options.
🟢 Info npm/src/agent/index.js:399
The extract_code tool description mentions 'tree-sitter AST parsing' but doesn't explain the difference between this and the low-level MCP server's extract tool, or when to use one versus the other. Users may be confused about which tool to use.
💡 SuggestionClarify in the description that this is the high-level agent MCP server (probe agent --mcp) vs the low-level code search MCP server. Explain that extract_code is for use with the agent server when you want the AI to have access to code extraction capabilities, while the low-level extract tool is for direct programmatic access.
🟡 Warning npm/src/agent/index.js:432
The extract_code handler lacks error handling for the extract function call. If extract() throws an error, it will propagate to the MCP server layer without proper error formatting or context.
💡 SuggestionWrap the extract() call in a try-catch block and return a properly formatted error response consistent with MCP error handling patterns. Consider adding error context like the file paths being extracted and the working directory.
🔧 Suggested Fix
try {
  const result = await extract(options);
  return {
    content: [{ type: 'text', text: result }],
  };
} catch (error) {
  console.error(`Error executing extract_code:`, error);
  return {
    content: [{
      type: 'text',
      text: `Error extracting code: ${error.message}`
    }],
    isError: true
  };
}
🟡 Warning npm/src/agent/index.js:432
Hardcoded default values (format: 'xml', allowTests: true) are embedded in the handler code. These should be configurable or documented constants to maintain consistency with the low-level MCP server implementation.
💡 SuggestionExtract these defaults to named constants at the top of the class or file, or reference the same constants used by the low-level MCP server to ensure consistency. Consider making these configurable via tool parameters if users might want different behavior.
🔧 Suggested Fix
// At class level or top of file:
const EXTRACT_DEFAULTS = {
  FORMAT: 'xml',
  ALLOW_TESTS: true
};

// In handler:
const options = {
files: args.files,
path: args.path,
format: EXTRACT_DEFAULTS.FORMAT,
allowTests: EXTRACT_DEFAULTS.ALLOW_TESTS
};

🟡 Warning npm/src/agent/index.js:432
Input validation only checks for presence of 'path' and non-empty 'files' array, but doesn't validate that file paths are strings or that the path is a valid directory path. Invalid inputs could cause cryptic errors from the extract function.
💡 SuggestionAdd more robust validation: (1) Verify args.files contains only strings, (2) Check that args.path is a non-empty string, (3) Optionally validate that the path exists or is a valid absolute path format. This provides clearer error messages to users.
🔧 Suggested Fix
if (!args.path || typeof args.path !== 'string' || args.path.trim().length === 0) {
  throw new Error("Path is required and must be a non-empty string");
}
if (!args.files || !Array.isArray(args.files) || args.files.length === 0) {
  throw new Error("Files array is required and must not be empty");
}
if (!args.files.every(f => typeof f === 'string' && f.trim().length > 0)) {
  throw new Error("All file paths must be non-empty strings");
}

Powered by Visor from Probelabs

Last updated: 2026-01-18T20:22:58.454Z | Triggered by: pr_opened | Commit: 74c2df7

💡 TIP: You can chat with Visor using /visor ask <your question>

@buger buger merged commit 0be9add into main Jan 19, 2026
16 of 18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants