Add disable_shell_tool option to CodexCoder for MCP-only evaluation #39
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Adds a configurable
disable_shell_tooloption to CodexCoder that prevents filesystem access via bash commands, enabling fair MCP-only evaluations.Note: This PR is based on the
fix-gemini-cli-invocationbranch which contains all previous Codex CLI fixes (MCP support, proper invocation, 401 fix, etc.). This should be merged after that branch is merged, or both can be merged together.Changes
disable_shell_toolattribute to CodexCoder class--disable shell_toolflag instead of--dangerously-bypass-approvals-and-sandboxcreate_coder()andrun_single_eval()to pass coder-specific options from YAML configUsage
Motivation
During MCP literature evaluations, we discovered Codex was using shell commands (
rg,sed,find) to access the filesystem and read expected answers from test case files. This made the evaluation results invalid.With
disable_shell_tool: true, Codex can only use MCP tools to retrieve information, ensuring a fair comparison with other agents.🤖 Generated with Claude Code