Skip to content

Add built-in run_python default tool with sandboxed local runner#68

Open
weilixu wants to merge 1 commit intodevelopfrom
codex/add-run_python-built-in-tool
Open

Add built-in run_python default tool with sandboxed local runner#68
weilixu wants to merge 1 commit intodevelopfrom
codex/add-run_python-built-in-tool

Conversation

@weilixu
Copy link
Contributor

@weilixu weilixu commented Mar 11, 2026

Motivation

  • Provide a first-class built-in run_python default tool for calculations, parsing, table/chart creation, file transformations, and simulation preparation while preserving the existing default-tool registry and interfaces.
  • Implement a practical sandbox to prevent shell access and obviously dangerous imports/patterns while enabling common data-processing workflows.

Description

  • Added a new tool package automa_ai/tools/run_python/ containing config.py, policy.py, runner.py, tool.py, and __init__.py implementing a RunPythonTool compatible with BaseDefaultTool and a build_run_python_tool(...) builder for the registry.
  • Implemented a lightweight static policy (policy.py) that rejects blocked imports and dangerous call patterns and a LocalSubprocessRunner (runner.py) that runs code in a temporary workspace using a subprocess, enforces timeouts, truncates outputs, and collects artifacts.
  • Registered the new tool alongside web_search in automa_ai/tools/__init__.py without changing ToolSpec, ToolsConfig, BaseDefaultTool, or build_langchain_tools.
  • Added examples/run_python_demo.py modeled on the existing default-tool demo, added tests in tests/test_run_python_tool.py, and updated docs/tools.md and README.md to document config, contract, and sandbox limitations.

Testing

  • Compiled the new module tree with python -m compileall ... and it succeeded (modules compile cleanly).
  • Ran pytest tests/test_run_python_tool.py tests/test_tools_config.py tests/test_web_search_tool.py -q but collection failed due to ModuleNotFoundError: No module named 'pydantic' in the environment, causing the test run to fail.
  • Attempted to install pydantic via python -m pip install pydantic but the environment could not download packages due to proxy/network restrictions, so unit tests could not be executed to completion here.

Notes: the patch adds unit tests that cover registry/build integration, arithmetic execution, JSON/CSV parsing, artifact generation, and sandbox rejection; these should pass in a normal environment with dependencies installed and are expected to be green in CI where pydantic and test dependencies are available. There are no breaking changes to existing tool interfaces.


Codex Task

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e431b08652

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +39 to +45
if isinstance(node.func, ast.Name) and node.func.id in _BLOCKED_CALLS:
raise PolicyViolationError(f"Blocked function call: {node.func.id}")
if isinstance(node.func, ast.Attribute):
if (
isinstance(node.func.value, ast.Name)
and node.func.value.id in _BLOCKED_BASES
and node.func.attr in _BLOCKED_ATTRS

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Block indirect dangerous call patterns in sandbox policy

The AST checks only reject __import__ when it appears as a direct ast.Name and only reject dangerous attributes when the receiver is literally named os/subprocess, so payloads like getattr(__builtins__, "__import__")("os").system("...") bypass policy and execute shell commands. In environments that rely on this tool for safe code execution, this defeats the advertised sandbox constraints and allows arbitrary host command execution.

Useful? React with 👍 / 👎.

Comment on lines +83 to +84
if not parsed.enabled:
raise ValueError("run_python tool is disabled by configuration.")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip disabled run_python specs instead of raising on build

RunPythonToolConfig exposes an enabled flag, but the builder raises ValueError when it is false, which causes build_langchain_tools(...) to fail the entire tool setup rather than simply omitting run_python. Any deployment that toggles this tool off via config will crash agent initialization even if all other tools are valid.

Useful? React with 👍 / 👎.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new built-in run_python default tool to the existing tools registry, intended to execute user-provided Python code with basic static policy checks and a local subprocess runner, plus documentation/examples/tests.

Changes:

  • Introduces automa_ai/tools/run_python/ (config, policy, runner, tool + builder) implementing RunPythonTool.
  • Registers run_python in the built-in default-tools registry alongside web_search.
  • Adds demo script, docs/README updates, and unit tests covering execution, artifacts, and policy rejection.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
automa_ai/tools/run_python/config.py Adds pydantic config model for the tool (timeouts, imports policy, artifact limits, etc.).
automa_ai/tools/run_python/policy.py Implements AST-based static checks for blocked imports/call patterns.
automa_ai/tools/run_python/runner.py Runs code via asyncio.create_subprocess_exec in a temp workspace; truncates outputs and collects artifacts.
automa_ai/tools/run_python/tool.py Implements RunPythonTool and build_run_python_tool(...) for registry integration.
automa_ai/tools/run_python/__init__.py Exposes tool + builder for imports/registration.
automa_ai/tools/__init__.py Registers run_python into DEFAULT_TOOL_REGISTRY at import time.
tests/test_run_python_tool.py Adds tests for registry build, execution, artifacts, and policy rejection.
examples/run_python_demo.py Adds runnable demo similar to the existing web_search demo.
docs/tools.md Documents tool inputs/outputs/config and sandbox limitations.
README.md Adds run_python to the sample tools configuration and references the new demo.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +17 to +23
try:
DEFAULT_TOOL_REGISTRY.register("run_python", build_run_python_tool)
except ValueError as exc:
logging.getLogger(__name__).debug(
"Ignoring ValueError while registering 'run_python' tool: %s",
exc,
)
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_python is only registered via this import-time side effect. If the application relies on load_tool_plugins() + entry points (see pyproject.toml currently listing only web_search), run_python won’t be discoverable unless automa_ai.tools is imported somewhere. Consider adding a project.entry-points."automa_ai.tools" entry for run_python (or ensure load_tool_plugins() also imports built-in tools) to make loading consistent.

Copilot uses AI. Check for mistakes.
Comment on lines +51 to +55
env = {
"PYTHONNOUSERSITE": "1",
"MPLBACKEND": "Agg",
"PATH": os.environ.get("PATH", ""),
}
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

env replaces the entire environment for the subprocess and only preserves PATH. On Windows this can break Python startup or subprocess behavior (commonly needs variables like SYSTEMROOT, TEMP/TMP, etc.), and on all platforms it may cause surprising locale/encoding behavior. Consider starting from a minimal-but-platform-safe baseline (e.g., copy required OS vars) while still stripping secrets, or document the supported platforms/requirements.

Suggested change
env = {
"PYTHONNOUSERSITE": "1",
"MPLBACKEND": "Agg",
"PATH": os.environ.get("PATH", ""),
}
env = os.environ.copy()
env.update(
{
"PYTHONNOUSERSITE": "1",
"MPLBACKEND": "Agg",
}
)

Copilot uses AI. Check for mistakes.
Comment on lines +26 to +29
blocked_imports = set(config.blocked_imports)
if not config.allow_network:
blocked_imports.update({"http", "httpx", "aiohttp", "ftplib", "ssl", "websocket"})
allowed_imports = set(config.allowed_imports)
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allow_network currently only expands the import blocklist; it does not actually prevent network access at runtime (e.g., code can still open sockets via already-imported stdlib modules or indirect APIs). This is fine as a best-effort policy, but it’s important to rename/reframe the flag (e.g., block_network_imports) or clearly document that it is not enforcement.

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +22
from automa_ai.tools.registry import build_langchain_tools
from automa_ai.tools.run_python.config import RunPythonToolConfig
from automa_ai.tools.run_python.tool import RunPythonTool


def test_registry_build_includes_run_python_and_web_search() -> None:
tools = build_langchain_tools(
[
ToolSpec(type="web_search", config={"provider": "opensource"}),
ToolSpec(type="run_python", config={}),
]
)
names = {tool.name for tool in tools}
assert "web_search" in names
assert "run_python" in names
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_langchain_tools is imported from automa_ai.tools.registry, which does not import automa_ai.tools and therefore won’t execute the side-effect registrations in automa_ai/tools/__init__.py. As a result, DEFAULT_TOOL_REGISTRY will be empty here and this test should fail with Unknown tool type 'web_search'/'run_python'. Import build_langchain_tools from automa_ai.tools (or explicitly import automa_ai.tools before building) so the registry is populated.

Copilot uses AI. Check for mistakes.
Comment on lines +36 to +37
"file transformations, and simulation preparation logic. No shell access."
)
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tool description claims “No shell access.” With the current lightweight AST policy + local subprocess execution, this is not a reliable security boundary (e.g., importing via importlib can bypass blocked-import checks, and indirect access to blocked builtins can bypass the call-name checks). Please reword the description to avoid promising hard sandboxing, and/or document it as best-effort / not for untrusted code.

Suggested change
"file transformations, and simulation preparation logic. No shell access."
)
"file transformations, and simulation preparation logic. Runs in a local "
"subprocess with policy-based restrictions; not a hardened sandbox and not "
"intended for untrusted code."

Copilot uses AI. Check for mistakes.
Comment on lines +114 to +125
if expected_outputs:
for rel in expected_outputs:
path = _resolve_temp_file(root, rel)
if path.exists() and path.is_file():
candidates.append(path)
else:
warnings.append(f"Expected output was not found: {rel}")
else:
for path in root.rglob("*"):
if path.is_file() and path.name != "__run_python__.py":
candidates.append(path)

Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When expected_outputs is empty, _collect_artifacts returns all files in the temp workspace (including any copied input_files). That makes the artifact set depend on inputs and can be surprising if artifacts are meant to represent outputs only. Consider excluding the copied inputs from candidates or collecting only newly-created/modified files when expected_outputs is omitted.

Copilot uses AI. Check for mistakes.
Comment on lines +126 to +129
for path in candidates:
if len(results) >= max_artifacts:
warnings.append("Artifact limit reached; some files were not returned.")
break
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_artifacts allows 0 (config ge=0), but the current loop will immediately emit “Artifact limit reached…” for the first candidate even though returning 0 artifacts was explicitly configured. Consider treating max_artifacts == 0 as “artifact collection disabled” (return [] without warning) or adjust the warning condition.

Copilot uses AI. Check for mistakes.
Comment on lines +71 to +75
- Runs Python only with a local subprocess runner.
- Does not expose shell or bash execution.
- Rejects blocked imports and known dangerous call patterns before execution.
- Enforces timeout and output truncation.
- Executes code in a temporary working directory and only returns files from that directory.
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documented sandbox guarantees for run_python (e.g., "Does not expose shell or bash execution" and blocked imports/network via blocked_imports/allow_network) overstate the actual isolation provided by the implementation. Because user code runs in a normal Python subprocess with only lightweight AST checks, it can still dynamically import blocked modules and invoke shell or network operations (for example, via importlib.import_module("os").system("id")), leading to unexpected host access for callers who rely on these guarantees. Please either strengthen the underlying sandbox to enforce these constraints or explicitly document that this is a best-effort policy-only sandbox that does not prevent shell, network, or broader filesystem access, so integrators do not treat it as a safe isolation boundary.

Suggested change
- Runs Python only with a local subprocess runner.
- Does not expose shell or bash execution.
- Rejects blocked imports and known dangerous call patterns before execution.
- Enforces timeout and output truncation.
- Executes code in a temporary working directory and only returns files from that directory.
- Uses a standard local Python subprocess runner (no OS-level sandbox or container).
- Intended as a best-effort, policy-based sandbox only — **not** a security or isolation boundary.
- Performs lightweight AST checks to detect blocked imports and known dangerous call patterns before execution, but these checks can be bypassed (e.g., via dynamic imports or reflection).
- May still allow shell, network, and broader filesystem access depending on the underlying Python environment, even when `blocked_imports` or `allow_network: false` are configured.
- Enforces timeout and output truncation.
- Executes code in a temporary working directory and only returns files from that directory by default, but this does not prevent code from reading or writing outside that directory if the environment permits it.

Copilot uses AI. Check for mistakes.
config:
runner: local_subprocess
timeout_s: 20
workspace_root: .
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example configuration suggests that setting allow_network: false on the run_python tool will reliably disable network access for executed code, but the implementation only applies lightweight AST-based checks and does not provide a true network sandbox. User code can still reach networking primitives via dynamic imports or modules not covered by blocked_imports, so callers who rely on this flag for isolation may inadvertently expose internal services or data. Please either enforce network isolation at the OS/container level or clearly document that allow_network is best-effort only and not a security boundary.

Suggested change
workspace_root: .
workspace_root: .
# Best-effort network restriction only; NOT a security boundary.
# User code may still reach networking via dynamic imports or modules
# not covered by blocked_imports. For strong isolation, enforce
# network sandboxing at the OS/container level.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants