Add disable_shell_tool option to CodexCoder for MCP-only evaluation #39

justaddcoffee · 2025-12-15T22:16:40Z

Summary

Adds a configurable disable_shell_tool option to CodexCoder that prevents filesystem access via bash commands, enabling fair MCP-only evaluations.

Note: This PR is based on the fix-gemini-cli-invocation branch which contains all previous Codex CLI fixes (MCP support, proper invocation, 401 fix, etc.). This should be merged after that branch is merged, or both can be merged together.

Changes

Added disable_shell_tool attribute to CodexCoder class
When enabled, Codex runs with --disable shell_tool flag instead of --dangerously-bypass-approvals-and-sandbox
Updated create_coder() and run_single_eval() to pass coder-specific options from YAML config

Usage

coders:
  codex:
    disable_shell_tool: true  # MCP-only mode

Motivation

During MCP literature evaluations, we discovered Codex was using shell commands (rg, sed, find) to access the filesystem and read expected answers from test case files. This made the evaluation results invalid.

With disable_shell_tool: true, Codex can only use MCP tools to retrieve information, ensuring a fair comparison with other agents.

🤖 Generated with Claude Code

When disable_shell_tool is set in the coder config, Codex runs with --disable shell_tool flag, preventing filesystem access via bash commands. This enables fair MCP-only evaluations where Codex can only use MCP tools. Usage in eval config: coders: codex: disable_shell_tool: true 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

claude · 2025-12-15T22:16:51Z

Claude finished @justaddcoffee's task —— View job

Code Review Complete ✅

Reviewed PR #39: Add disable_shell_tool option to CodexCoder for MCP-only evaluation

Todo List:

Read and analyze modified files
Review code quality and best practices
Check for potential bugs or issues
Assess performance considerations
Evaluate security concerns
Review test coverage needs
Provide comprehensive feedback

Review Summary

This PR adds a configurable disable_shell_tool option to CodexCoder for MCP-only evaluation. The implementation is well-designed and addresses a legitimate need for fair evaluation comparisons by preventing filesystem access through shell commands.

Code Quality & Best Practices ✅

Strengths:

Clean, focused implementation with minimal surface area
Follows existing codebase patterns and conventions
Good documentation in class docstring (src/metacoder/coders/codex.py:45-50)
Proper logging with informative messages (src/metacoder/coders/codex.py:163)
Configuration through YAML is intuitive and consistent

Areas for Improvement:

Consider adding type hints for the disable_shell_tool attribute to be explicit:

disable_shell_tool: bool = Field(default=False, description="Disable shell/bash access, MCP-only mode")

Potential Bugs & Issues ⚠️

Minor Issues:

Flag validation: The code assumes the --disable shell_tool flag exists in the codex CLI but doesn't validate this. Consider adding a comment about minimum required codex CLI version.
Error handling: If the --disable shell_tool flag isn't supported by the installed codex version, the error won't be very informative. The current error handling in src/metacoder/coders/codex.py:190 would catch this but might be unclear.

Security Concerns ✅

Positive Security Impact:

The feature actually improves security by providing a way to restrict filesystem access
Using --disable shell_tool instead of --dangerously-bypass-approvals-and-sandbox is more secure
No new security vulnerabilities introduced

Performance Considerations ✅

Performance Impact:

Positive impact: Disabling shell tools should improve performance by preventing filesystem operations
No performance regressions introduced
Command building logic is efficient (src/metacoder/coders/codex.py:156-167)

Implementation Quality ✅

Well-structured changes:

CodexCoder class (src/metacoder/coders/codex.py:55):
- Clean attribute addition
- Proper documentation
- Follows existing patterns
Command building logic (src/metacoder/coders/codex.py:156-167):
- Clear conditional logic
- Good separation between MCP-only and default modes
- Informative logging
Configuration passing (src/metacoder/evals/runner.py:187-194):
- Robust option handling with hasattr check
- Proper warning for unknown options
- Good logging

Test Coverage 🔶

Missing test coverage (recommendation):

Unit tests for the disable_shell_tool flag functionality
Integration tests to verify MCP-only mode works as expected
Test cases for both enabled/disabled states

Suggested test additions:

def test_codex_disable_shell_tool_flag():
    """Test that disable_shell_tool affects command building."""
    coder = CodexCoder(workdir="test", disable_shell_tool=True)
    # Mock the command execution and verify --disable shell_tool is used

def test_codex_coder_options_integration():
    """Test coder options are passed correctly from runner."""
    # Test create_coder with coder_options containing disable_shell_tool

Architecture Alignment ✅

The changes align well with the existing architecture:

Follows the established pattern of coder-specific configuration
Integrates properly with the evaluation framework
Maintains backward compatibility

Minor Suggestions

Documentation: Consider adding an example in CLAUDE.md showing the disable_shell_tool option
Validation: Add validation to ensure disable_shell_tool is boolean in CodexCoder
Logging: Consider DEBUG level for the "Set coder option" message in runner.py:192

Overall Assessment

✅ This is a well-implemented feature that:

Addresses a legitimate evaluation fairness concern
Implements the solution cleanly with minimal code changes
Follows established patterns and conventions
Improves security by providing a restricted mode
Maintains backward compatibility

Recommendation: Approve with minor suggestions for test coverage and documentation improvements.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add disable_shell_tool option to CodexCoder for MCP-only evaluation #39

Add disable_shell_tool option to CodexCoder for MCP-only evaluation #39

Uh oh!

justaddcoffee commented Dec 15, 2025

Uh oh!

claude bot commented Dec 15, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add disable_shell_tool option to CodexCoder for MCP-only evaluation #39

Are you sure you want to change the base?

Add disable_shell_tool option to CodexCoder for MCP-only evaluation #39

Uh oh!

Conversation

justaddcoffee commented Dec 15, 2025

Summary

Changes

Usage

Motivation

Uh oh!

claude bot commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Complete ✅

Todo List:

Review Summary

Code Quality & Best Practices ✅

Potential Bugs & Issues ⚠️

Security Concerns ✅

Performance Considerations ✅

Implementation Quality ✅

Test Coverage 🔶

Architecture Alignment ✅

Minor Suggestions

Overall Assessment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude bot commented Dec 15, 2025 •

edited

Loading