Add built-in run_python default tool with sandboxed local runner#68
Add built-in run_python default tool with sandboxed local runner#68
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e431b08652
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if isinstance(node.func, ast.Name) and node.func.id in _BLOCKED_CALLS: | ||
| raise PolicyViolationError(f"Blocked function call: {node.func.id}") | ||
| if isinstance(node.func, ast.Attribute): | ||
| if ( | ||
| isinstance(node.func.value, ast.Name) | ||
| and node.func.value.id in _BLOCKED_BASES | ||
| and node.func.attr in _BLOCKED_ATTRS |
There was a problem hiding this comment.
Block indirect dangerous call patterns in sandbox policy
The AST checks only reject __import__ when it appears as a direct ast.Name and only reject dangerous attributes when the receiver is literally named os/subprocess, so payloads like getattr(__builtins__, "__import__")("os").system("...") bypass policy and execute shell commands. In environments that rely on this tool for safe code execution, this defeats the advertised sandbox constraints and allows arbitrary host command execution.
Useful? React with 👍 / 👎.
| if not parsed.enabled: | ||
| raise ValueError("run_python tool is disabled by configuration.") |
There was a problem hiding this comment.
Skip disabled run_python specs instead of raising on build
RunPythonToolConfig exposes an enabled flag, but the builder raises ValueError when it is false, which causes build_langchain_tools(...) to fail the entire tool setup rather than simply omitting run_python. Any deployment that toggles this tool off via config will crash agent initialization even if all other tools are valid.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
Adds a new built-in run_python default tool to the existing tools registry, intended to execute user-provided Python code with basic static policy checks and a local subprocess runner, plus documentation/examples/tests.
Changes:
- Introduces
automa_ai/tools/run_python/(config, policy, runner, tool + builder) implementingRunPythonTool. - Registers
run_pythonin the built-in default-tools registry alongsideweb_search. - Adds demo script, docs/README updates, and unit tests covering execution, artifacts, and policy rejection.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
automa_ai/tools/run_python/config.py |
Adds pydantic config model for the tool (timeouts, imports policy, artifact limits, etc.). |
automa_ai/tools/run_python/policy.py |
Implements AST-based static checks for blocked imports/call patterns. |
automa_ai/tools/run_python/runner.py |
Runs code via asyncio.create_subprocess_exec in a temp workspace; truncates outputs and collects artifacts. |
automa_ai/tools/run_python/tool.py |
Implements RunPythonTool and build_run_python_tool(...) for registry integration. |
automa_ai/tools/run_python/__init__.py |
Exposes tool + builder for imports/registration. |
automa_ai/tools/__init__.py |
Registers run_python into DEFAULT_TOOL_REGISTRY at import time. |
tests/test_run_python_tool.py |
Adds tests for registry build, execution, artifacts, and policy rejection. |
examples/run_python_demo.py |
Adds runnable demo similar to the existing web_search demo. |
docs/tools.md |
Documents tool inputs/outputs/config and sandbox limitations. |
README.md |
Adds run_python to the sample tools configuration and references the new demo. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| try: | ||
| DEFAULT_TOOL_REGISTRY.register("run_python", build_run_python_tool) | ||
| except ValueError as exc: | ||
| logging.getLogger(__name__).debug( | ||
| "Ignoring ValueError while registering 'run_python' tool: %s", | ||
| exc, | ||
| ) |
There was a problem hiding this comment.
run_python is only registered via this import-time side effect. If the application relies on load_tool_plugins() + entry points (see pyproject.toml currently listing only web_search), run_python won’t be discoverable unless automa_ai.tools is imported somewhere. Consider adding a project.entry-points."automa_ai.tools" entry for run_python (or ensure load_tool_plugins() also imports built-in tools) to make loading consistent.
| env = { | ||
| "PYTHONNOUSERSITE": "1", | ||
| "MPLBACKEND": "Agg", | ||
| "PATH": os.environ.get("PATH", ""), | ||
| } |
There was a problem hiding this comment.
env replaces the entire environment for the subprocess and only preserves PATH. On Windows this can break Python startup or subprocess behavior (commonly needs variables like SYSTEMROOT, TEMP/TMP, etc.), and on all platforms it may cause surprising locale/encoding behavior. Consider starting from a minimal-but-platform-safe baseline (e.g., copy required OS vars) while still stripping secrets, or document the supported platforms/requirements.
| env = { | |
| "PYTHONNOUSERSITE": "1", | |
| "MPLBACKEND": "Agg", | |
| "PATH": os.environ.get("PATH", ""), | |
| } | |
| env = os.environ.copy() | |
| env.update( | |
| { | |
| "PYTHONNOUSERSITE": "1", | |
| "MPLBACKEND": "Agg", | |
| } | |
| ) |
| blocked_imports = set(config.blocked_imports) | ||
| if not config.allow_network: | ||
| blocked_imports.update({"http", "httpx", "aiohttp", "ftplib", "ssl", "websocket"}) | ||
| allowed_imports = set(config.allowed_imports) |
There was a problem hiding this comment.
allow_network currently only expands the import blocklist; it does not actually prevent network access at runtime (e.g., code can still open sockets via already-imported stdlib modules or indirect APIs). This is fine as a best-effort policy, but it’s important to rename/reframe the flag (e.g., block_network_imports) or clearly document that it is not enforcement.
| from automa_ai.tools.registry import build_langchain_tools | ||
| from automa_ai.tools.run_python.config import RunPythonToolConfig | ||
| from automa_ai.tools.run_python.tool import RunPythonTool | ||
|
|
||
|
|
||
| def test_registry_build_includes_run_python_and_web_search() -> None: | ||
| tools = build_langchain_tools( | ||
| [ | ||
| ToolSpec(type="web_search", config={"provider": "opensource"}), | ||
| ToolSpec(type="run_python", config={}), | ||
| ] | ||
| ) | ||
| names = {tool.name for tool in tools} | ||
| assert "web_search" in names | ||
| assert "run_python" in names |
There was a problem hiding this comment.
build_langchain_tools is imported from automa_ai.tools.registry, which does not import automa_ai.tools and therefore won’t execute the side-effect registrations in automa_ai/tools/__init__.py. As a result, DEFAULT_TOOL_REGISTRY will be empty here and this test should fail with Unknown tool type 'web_search'/'run_python'. Import build_langchain_tools from automa_ai.tools (or explicitly import automa_ai.tools before building) so the registry is populated.
| "file transformations, and simulation preparation logic. No shell access." | ||
| ) |
There was a problem hiding this comment.
The tool description claims “No shell access.” With the current lightweight AST policy + local subprocess execution, this is not a reliable security boundary (e.g., importing via importlib can bypass blocked-import checks, and indirect access to blocked builtins can bypass the call-name checks). Please reword the description to avoid promising hard sandboxing, and/or document it as best-effort / not for untrusted code.
| "file transformations, and simulation preparation logic. No shell access." | |
| ) | |
| "file transformations, and simulation preparation logic. Runs in a local " | |
| "subprocess with policy-based restrictions; not a hardened sandbox and not " | |
| "intended for untrusted code." |
| if expected_outputs: | ||
| for rel in expected_outputs: | ||
| path = _resolve_temp_file(root, rel) | ||
| if path.exists() and path.is_file(): | ||
| candidates.append(path) | ||
| else: | ||
| warnings.append(f"Expected output was not found: {rel}") | ||
| else: | ||
| for path in root.rglob("*"): | ||
| if path.is_file() and path.name != "__run_python__.py": | ||
| candidates.append(path) | ||
|
|
There was a problem hiding this comment.
When expected_outputs is empty, _collect_artifacts returns all files in the temp workspace (including any copied input_files). That makes the artifact set depend on inputs and can be surprising if artifacts are meant to represent outputs only. Consider excluding the copied inputs from candidates or collecting only newly-created/modified files when expected_outputs is omitted.
| for path in candidates: | ||
| if len(results) >= max_artifacts: | ||
| warnings.append("Artifact limit reached; some files were not returned.") | ||
| break |
There was a problem hiding this comment.
max_artifacts allows 0 (config ge=0), but the current loop will immediately emit “Artifact limit reached…” for the first candidate even though returning 0 artifacts was explicitly configured. Consider treating max_artifacts == 0 as “artifact collection disabled” (return [] without warning) or adjust the warning condition.
| - Runs Python only with a local subprocess runner. | ||
| - Does not expose shell or bash execution. | ||
| - Rejects blocked imports and known dangerous call patterns before execution. | ||
| - Enforces timeout and output truncation. | ||
| - Executes code in a temporary working directory and only returns files from that directory. |
There was a problem hiding this comment.
The documented sandbox guarantees for run_python (e.g., "Does not expose shell or bash execution" and blocked imports/network via blocked_imports/allow_network) overstate the actual isolation provided by the implementation. Because user code runs in a normal Python subprocess with only lightweight AST checks, it can still dynamically import blocked modules and invoke shell or network operations (for example, via importlib.import_module("os").system("id")), leading to unexpected host access for callers who rely on these guarantees. Please either strengthen the underlying sandbox to enforce these constraints or explicitly document that this is a best-effort policy-only sandbox that does not prevent shell, network, or broader filesystem access, so integrators do not treat it as a safe isolation boundary.
| - Runs Python only with a local subprocess runner. | |
| - Does not expose shell or bash execution. | |
| - Rejects blocked imports and known dangerous call patterns before execution. | |
| - Enforces timeout and output truncation. | |
| - Executes code in a temporary working directory and only returns files from that directory. | |
| - Uses a standard local Python subprocess runner (no OS-level sandbox or container). | |
| - Intended as a best-effort, policy-based sandbox only — **not** a security or isolation boundary. | |
| - Performs lightweight AST checks to detect blocked imports and known dangerous call patterns before execution, but these checks can be bypassed (e.g., via dynamic imports or reflection). | |
| - May still allow shell, network, and broader filesystem access depending on the underlying Python environment, even when `blocked_imports` or `allow_network: false` are configured. | |
| - Enforces timeout and output truncation. | |
| - Executes code in a temporary working directory and only returns files from that directory by default, but this does not prevent code from reading or writing outside that directory if the environment permits it. |
| config: | ||
| runner: local_subprocess | ||
| timeout_s: 20 | ||
| workspace_root: . |
There was a problem hiding this comment.
This example configuration suggests that setting allow_network: false on the run_python tool will reliably disable network access for executed code, but the implementation only applies lightweight AST-based checks and does not provide a true network sandbox. User code can still reach networking primitives via dynamic imports or modules not covered by blocked_imports, so callers who rely on this flag for isolation may inadvertently expose internal services or data. Please either enforce network isolation at the OS/container level or clearly document that allow_network is best-effort only and not a security boundary.
| workspace_root: . | |
| workspace_root: . | |
| # Best-effort network restriction only; NOT a security boundary. | |
| # User code may still reach networking via dynamic imports or modules | |
| # not covered by blocked_imports. For strong isolation, enforce | |
| # network sandboxing at the OS/container level. |
Motivation
run_pythondefault tool for calculations, parsing, table/chart creation, file transformations, and simulation preparation while preserving the existing default-tool registry and interfaces.Description
automa_ai/tools/run_python/containingconfig.py,policy.py,runner.py,tool.py, and__init__.pyimplementing aRunPythonToolcompatible withBaseDefaultTooland abuild_run_python_tool(...)builder for the registry.policy.py) that rejects blocked imports and dangerous call patterns and aLocalSubprocessRunner(runner.py) that runs code in a temporary workspace using a subprocess, enforces timeouts, truncates outputs, and collects artifacts.web_searchinautoma_ai/tools/__init__.pywithout changingToolSpec,ToolsConfig,BaseDefaultTool, orbuild_langchain_tools.examples/run_python_demo.pymodeled on the existing default-tool demo, added tests intests/test_run_python_tool.py, and updateddocs/tools.mdandREADME.mdto document config, contract, and sandbox limitations.Testing
python -m compileall ...and it succeeded (modules compile cleanly).pytest tests/test_run_python_tool.py tests/test_tools_config.py tests/test_web_search_tool.py -qbut collection failed due toModuleNotFoundError: No module named 'pydantic'in the environment, causing the test run to fail.pydanticviapython -m pip install pydanticbut the environment could not download packages due to proxy/network restrictions, so unit tests could not be executed to completion here.Notes: the patch adds unit tests that cover registry/build integration, arithmetic execution, JSON/CSV parsing, artifact generation, and sandbox rejection; these should pass in a normal environment with dependencies installed and are expected to be green in CI where
pydanticand test dependencies are available. There are no breaking changes to existing tool interfaces.Codex Task