Add built-in run_python default tool with sandboxed local runner by weilixu · Pull Request #68 · pnnl/BEM-AI

weilixu · 2026-03-11T04:39:36Z

Motivation

Provide a first-class built-in run_python default tool for calculations, parsing, table/chart creation, file transformations, and simulation preparation while preserving the existing default-tool registry and interfaces.
Implement a practical sandbox to prevent shell access and obviously dangerous imports/patterns while enabling common data-processing workflows.

Description

Added a new tool package automa_ai/tools/run_python/ containing config.py, policy.py, runner.py, tool.py, and __init__.py implementing a RunPythonTool compatible with BaseDefaultTool and a build_run_python_tool(...) builder for the registry.
Implemented a lightweight static policy (policy.py) that rejects blocked imports and dangerous call patterns and a LocalSubprocessRunner (runner.py) that runs code in a temporary workspace using a subprocess, enforces timeouts, truncates outputs, and collects artifacts.
Registered the new tool alongside web_search in automa_ai/tools/__init__.py without changing ToolSpec, ToolsConfig, BaseDefaultTool, or build_langchain_tools.
Added examples/run_python_demo.py modeled on the existing default-tool demo, added tests in tests/test_run_python_tool.py, and updated docs/tools.md and README.md to document config, contract, and sandbox limitations.

Testing

Compiled the new module tree with python -m compileall ... and it succeeded (modules compile cleanly).
Ran pytest tests/test_run_python_tool.py tests/test_tools_config.py tests/test_web_search_tool.py -q but collection failed due to ModuleNotFoundError: No module named 'pydantic' in the environment, causing the test run to fail.
Attempted to install pydantic via python -m pip install pydantic but the environment could not download packages due to proxy/network restrictions, so unit tests could not be executed to completion here.

Notes: the patch adds unit tests that cover registry/build integration, arithmetic execution, JSON/CSV parsing, artifact generation, and sandbox rejection; these should pass in a normal environment with dependencies installed and are expected to be green in CI where pydantic and test dependencies are available. There are no breaking changes to existing tool interfaces.

Codex Task

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e431b08652

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-11T04:44:03Z

automa_ai/tools/run_python/policy.py

+            if isinstance(node.func, ast.Name) and node.func.id in _BLOCKED_CALLS:
+                raise PolicyViolationError(f"Blocked function call: {node.func.id}")
+            if isinstance(node.func, ast.Attribute):
+                if (
+                    isinstance(node.func.value, ast.Name)
+                    and node.func.value.id in _BLOCKED_BASES
+                    and node.func.attr in _BLOCKED_ATTRS


Block indirect dangerous call patterns in sandbox policy

The AST checks only reject __import__ when it appears as a direct ast.Name and only reject dangerous attributes when the receiver is literally named os/subprocess, so payloads like getattr(__builtins__, "__import__")("os").system("...") bypass policy and execute shell commands. In environments that rely on this tool for safe code execution, this defeats the advertised sandbox constraints and allows arbitrary host command execution.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-11T04:44:03Z

automa_ai/tools/run_python/tool.py

+    if not parsed.enabled:
+        raise ValueError("run_python tool is disabled by configuration.")


Skip disabled run_python specs instead of raising on build

RunPythonToolConfig exposes an enabled flag, but the builder raises ValueError when it is false, which causes build_langchain_tools(...) to fail the entire tool setup rather than simply omitting run_python. Any deployment that toggles this tool off via config will crash agent initialization even if all other tools are valid.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

Adds a new built-in run_python default tool to the existing tools registry, intended to execute user-provided Python code with basic static policy checks and a local subprocess runner, plus documentation/examples/tests.

Changes:

Introduces automa_ai/tools/run_python/ (config, policy, runner, tool + builder) implementing RunPythonTool.
Registers run_python in the built-in default-tools registry alongside web_search.
Adds demo script, docs/README updates, and unit tests covering execution, artifacts, and policy rejection.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
`automa_ai/tools/run_python/config.py`	Adds pydantic config model for the tool (timeouts, imports policy, artifact limits, etc.).
`automa_ai/tools/run_python/policy.py`	Implements AST-based static checks for blocked imports/call patterns.
`automa_ai/tools/run_python/runner.py`	Runs code via `asyncio.create_subprocess_exec` in a temp workspace; truncates outputs and collects artifacts.
`automa_ai/tools/run_python/tool.py`	Implements `RunPythonTool` and `build_run_python_tool(...)` for registry integration.
`automa_ai/tools/run_python/__init__.py`	Exposes tool + builder for imports/registration.
`automa_ai/tools/__init__.py`	Registers `run_python` into `DEFAULT_TOOL_REGISTRY` at import time.
`tests/test_run_python_tool.py`	Adds tests for registry build, execution, artifacts, and policy rejection.
`examples/run_python_demo.py`	Adds runnable demo similar to the existing web_search demo.
`docs/tools.md`	Documents tool inputs/outputs/config and sandbox limitations.
`README.md`	Adds `run_python` to the sample tools configuration and references the new demo.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-11T04:48:12Z

automa_ai/tools/__init__.py

+try:
+    DEFAULT_TOOL_REGISTRY.register("run_python", build_run_python_tool)
+except ValueError as exc:
+    logging.getLogger(__name__).debug(
+        "Ignoring ValueError while registering 'run_python' tool: %s",
+        exc,
+    )


run_python is only registered via this import-time side effect. If the application relies on load_tool_plugins() + entry points (see pyproject.toml currently listing only web_search), run_python won’t be discoverable unless automa_ai.tools is imported somewhere. Consider adding a project.entry-points."automa_ai.tools" entry for run_python (or ensure load_tool_plugins() also imports built-in tools) to make loading consistent.

Copilot · 2026-03-11T04:48:12Z

automa_ai/tools/run_python/runner.py

+            env = {
+                "PYTHONNOUSERSITE": "1",
+                "MPLBACKEND": "Agg",
+                "PATH": os.environ.get("PATH", ""),
+            }


env replaces the entire environment for the subprocess and only preserves PATH. On Windows this can break Python startup or subprocess behavior (commonly needs variables like SYSTEMROOT, TEMP/TMP, etc.), and on all platforms it may cause surprising locale/encoding behavior. Consider starting from a minimal-but-platform-safe baseline (e.g., copy required OS vars) while still stripping secrets, or document the supported platforms/requirements.

Suggested change

env = {

"PYTHONNOUSERSITE": "1",

"MPLBACKEND": "Agg",

"PATH": os.environ.get("PATH", ""),

}

env = os.environ.copy()

env.update(

{

"PYTHONNOUSERSITE": "1",

"MPLBACKEND": "Agg",

}

)

Copilot · 2026-03-11T04:48:13Z

automa_ai/tools/run_python/policy.py

+    blocked_imports = set(config.blocked_imports)
+    if not config.allow_network:
+        blocked_imports.update({"http", "httpx", "aiohttp", "ftplib", "ssl", "websocket"})
+    allowed_imports = set(config.allowed_imports)


allow_network currently only expands the import blocklist; it does not actually prevent network access at runtime (e.g., code can still open sockets via already-imported stdlib modules or indirect APIs). This is fine as a best-effort policy, but it’s important to rename/reframe the flag (e.g., block_network_imports) or clearly document that it is not enforcement.

Copilot · 2026-03-11T04:48:13Z

tests/test_run_python_tool.py

+from automa_ai.tools.registry import build_langchain_tools
+from automa_ai.tools.run_python.config import RunPythonToolConfig
+from automa_ai.tools.run_python.tool import RunPythonTool
+
+
+def test_registry_build_includes_run_python_and_web_search() -> None:
+    tools = build_langchain_tools(
+        [
+            ToolSpec(type="web_search", config={"provider": "opensource"}),
+            ToolSpec(type="run_python", config={}),
+        ]
+    )
+    names = {tool.name for tool in tools}
+    assert "web_search" in names
+    assert "run_python" in names


build_langchain_tools is imported from automa_ai.tools.registry, which does not import automa_ai.tools and therefore won’t execute the side-effect registrations in automa_ai/tools/__init__.py. As a result, DEFAULT_TOOL_REGISTRY will be empty here and this test should fail with Unknown tool type 'web_search'/'run_python'. Import build_langchain_tools from automa_ai.tools (or explicitly import automa_ai.tools before building) so the registry is populated.

Copilot · 2026-03-11T04:48:13Z

automa_ai/tools/run_python/tool.py

+            "file transformations, and simulation preparation logic. No shell access."
+        )


The tool description claims “No shell access.” With the current lightweight AST policy + local subprocess execution, this is not a reliable security boundary (e.g., importing via importlib can bypass blocked-import checks, and indirect access to blocked builtins can bypass the call-name checks). Please reword the description to avoid promising hard sandboxing, and/or document it as best-effort / not for untrusted code.

Suggested change

"file transformations, and simulation preparation logic. No shell access."

)

"file transformations, and simulation preparation logic. Runs in a local "

"subprocess with policy-based restrictions; not a hardened sandbox and not "

"intended for untrusted code."

Copilot · 2026-03-11T04:48:13Z

automa_ai/tools/run_python/runner.py

+    if expected_outputs:
+        for rel in expected_outputs:
+            path = _resolve_temp_file(root, rel)
+            if path.exists() and path.is_file():
+                candidates.append(path)
+            else:
+                warnings.append(f"Expected output was not found: {rel}")
+    else:
+        for path in root.rglob("*"):
+            if path.is_file() and path.name != "__run_python__.py":
+                candidates.append(path)
+


When expected_outputs is empty, _collect_artifacts returns all files in the temp workspace (including any copied input_files). That makes the artifact set depend on inputs and can be surprising if artifacts are meant to represent outputs only. Consider excluding the copied inputs from candidates or collecting only newly-created/modified files when expected_outputs is omitted.

Copilot · 2026-03-11T04:48:13Z

automa_ai/tools/run_python/runner.py

+    for path in candidates:
+        if len(results) >= max_artifacts:
+            warnings.append("Artifact limit reached; some files were not returned.")
+            break


max_artifacts allows 0 (config ge=0), but the current loop will immediately emit “Artifact limit reached…” for the first candidate even though returning 0 artifacts was explicitly configured. Consider treating max_artifacts == 0 as “artifact collection disabled” (return [] without warning) or adjust the warning condition.

Copilot · 2026-03-11T04:48:14Z

docs/tools.md

+- Runs Python only with a local subprocess runner.
+- Does not expose shell or bash execution.
+- Rejects blocked imports and known dangerous call patterns before execution.
+- Enforces timeout and output truncation.
+- Executes code in a temporary working directory and only returns files from that directory.


The documented sandbox guarantees for run_python (e.g., "Does not expose shell or bash execution" and blocked imports/network via blocked_imports/allow_network) overstate the actual isolation provided by the implementation. Because user code runs in a normal Python subprocess with only lightweight AST checks, it can still dynamically import blocked modules and invoke shell or network operations (for example, via importlib.import_module("os").system("id")), leading to unexpected host access for callers who rely on these guarantees. Please either strengthen the underlying sandbox to enforce these constraints or explicitly document that this is a best-effort policy-only sandbox that does not prevent shell, network, or broader filesystem access, so integrators do not treat it as a safe isolation boundary.

Suggested change

- Runs Python only with a local subprocess runner.

- Does not expose shell or bash execution.

- Rejects blocked imports and known dangerous call patterns before execution.

- Enforces timeout and output truncation.

- Executes code in a temporary working directory and only returns files from that directory.

- Uses a standard local Python subprocess runner (no OS-level sandbox or container).

- Intended as a best-effort, policy-based sandbox only — **not** a security or isolation boundary.

- Performs lightweight AST checks to detect blocked imports and known dangerous call patterns before execution, but these checks can be bypassed (e.g., via dynamic imports or reflection).

- May still allow shell, network, and broader filesystem access depending on the underlying Python environment, even when `blocked_imports` or `allow_network: false` are configured.

- Enforces timeout and output truncation.

- Executes code in a temporary working directory and only returns files from that directory by default, but this does not prevent code from reading or writing outside that directory if the environment permits it.

Copilot · 2026-03-11T04:48:14Z

README.md

+    config:
+      runner: local_subprocess
+      timeout_s: 20
+      workspace_root: .


This example configuration suggests that setting allow_network: false on the run_python tool will reliably disable network access for executed code, but the implementation only applies lightweight AST-based checks and does not provide a true network sandbox. User code can still reach networking primitives via dynamic imports or modules not covered by blocked_imports, so callers who rely on this flag for isolation may inadvertently expose internal services or data. Please either enforce network isolation at the OS/container level or clearly document that allow_network is best-effort only and not a security boundary.

Suggested change

workspace_root: .

workspace_root: .

# Best-effort network restriction only; NOT a security boundary.

# User code may still reach networking via dynamic imports or modules

# not covered by blocked_imports. For strong isolation, enforce

# network sandboxing at the OS/container level.

Add built-in run_python default tool

e431b08

weilixu added the codex label Mar 11, 2026 — with ChatGPT Codex Connector

weilixu requested a review from Copilot March 11, 2026 04:39

Copilot started reviewing on behalf of weilixu March 11, 2026 04:40 View session

chatgpt-codex-connector bot reviewed Mar 11, 2026

View reviewed changes

Copilot AI reviewed Mar 11, 2026

View reviewed changes

		if not parsed.enabled:
		raise ValueError("run_python tool is disabled by configuration.")

		"file transformations, and simulation preparation logic. No shell access."
		)

-            "file transformations, and simulation preparation logic. No shell access."
-        )
+            "file transformations, and simulation preparation logic. Runs in a local "
+            "subprocess with policy-based restrictions; not a hardened sandbox and not "
+            "intended for untrusted code."

-      workspace_root: .
+      workspace_root: .
+      # Best-effort network restriction only; NOT a security boundary.
+      # User code may still reach networking via dynamic imports or modules
+      # not covered by blocked_imports. For strong isolation, enforce
+      # network sandboxing at the OS/container level.

Conversation

weilixu commented Mar 11, 2026

Motivation

Description

Testing

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants