-
Notifications
You must be signed in to change notification settings - Fork 70
Description
@disler, this looks like an interesting effort.
There is a lot of work going on trying to make coding agents safer to run on a host machine where arbitrary bash commands can do massive damage (e.g., either by editing or deleting data or exfiltrating sensitive data). So a project like this helps to avoid that damage. Great.
However, if we are letting the agent write arbitrary code and run that code (e.g., while executing tests in the agentic loop), these filters for tool calls don't catch any of that, correct? So unless a human carefully reviews every iteration of code and tests before the agent runs the tests, does this not mean that an agent can still run arbitrary commands using the user's permissions on the host machine? If we require a human to carefully review every intermediate version of the code and tests before the tests get run, does this not destroy the utility of a coding agent like this? (I guess we could use a 2nd LLM to review intermediate versions of the code or tests to look for dangers before we allow the agent to run the tests, but we would not trust those review LLMs and it would increase the cost and produce a lot of false positives that require human intervention anyway.)
Yes, such filtering of direct tool calls like in this project will reduce the risks of running a coding agent on a host machine, but it does not make it safe to use coding agents on any given host machine. And a properly prompted LLM (e.g., using a prompt injection attack) can do some crazy things, and smarter models will be able to write sneaky code that runs as part of tests that will do great damage. I think we need to assume every coding agent will do the maximum possible damage the driving user could do if you use a coding agent for long enough.
Why are people not just running these agents in isolated containers? If those containers are setup and locked down correctly, the risks of using a coding agent are massively reduced.
Just curious what your view on this problem?