Skip to content

Does this actually make the agent safe to run on a host machine? #7

@bartlettroscoe

Description

@bartlettroscoe

@disler, this looks like an interesting effort.

There is a lot of work going on trying to make coding agents safer to run on a host machine where arbitrary bash commands can do massive damage (e.g., either by editing or deleting data or exfiltrating sensitive data). So a project like this helps to avoid that damage. Great.

However, if we are letting the agent write arbitrary code and run that code (e.g., while executing tests in the agentic loop), these filters for tool calls don't catch any of that, correct? So unless a human carefully reviews every iteration of code and tests before the agent runs the tests, does this not mean that an agent can still run arbitrary commands using the user's permissions on the host machine? If we require a human to carefully review every intermediate version of the code and tests before the tests get run, does this not destroy the utility of a coding agent like this? (I guess we could use a 2nd LLM to review intermediate versions of the code or tests to look for dangers before we allow the agent to run the tests, but we would not trust those review LLMs and it would increase the cost and produce a lot of false positives that require human intervention anyway.)

Yes, such filtering of direct tool calls like in this project will reduce the risks of running a coding agent on a host machine, but it does not make it safe to use coding agents on any given host machine. And a properly prompted LLM (e.g., using a prompt injection attack) can do some crazy things, and smarter models will be able to write sneaky code that runs as part of tests that will do great damage. I think we need to assume every coding agent will do the maximum possible damage the driving user could do if you use a coding agent for long enough.

Why are people not just running these agents in isolated containers? If those containers are setup and locked down correctly, the risks of using a coding agent are massively reduced.

Just curious what your view on this problem?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions