An AI coding agent CLI tool that can autonomously read, write, and execute Python files within a sandboxed working directory. Built with Python and Google's Gemini API.
Agent Smith is an AI agent that uses function calling to interact with a filesystem. It can iteratively work through complex multi-step tasks by:
- Listing and exploring files in a directory
- Reading file contents
- Executing Python files with arguments
- Writing or modifying files
- Building context across multiple iterations to solve problems
The agent operates within a sandboxed working directory (./calculator) to ensure security.
- Agent Loop: Iteratively calls the LLM up to 20 times, maintaining conversation history to work through complex tasks
- Function Calling: Four core functions for filesystem and code execution operations
- Conversation History: Full context maintained across iterations for intelligent decision-making
- Sandboxed Execution: All operations confined to a specified working directory
- Verbose Mode: Optional detailed logging of token usage and function calls
- Python 3.10 or higher
- uv package manager
- Google Gemini API key
-
Clone the repository:
git clone <repository-url> cd agent-smith
-
Install dependencies:
uv sync
-
Set up environment variables: Create a
.envfile in the project root:GEMINI_API_KEY=your_api_key_here
uv run python main.py "your prompt here"Get detailed output including token counts and function call details:
uv run python main.py "your prompt here" --verbose# Ask questions about the code
uv run python main.py "How does the calculator render results to the console?"
# List and explore files
uv run python main.py "List all Python files in the working directory"
# Fix bugs
uv run python main.py "Fix the calculator and run the tests"
# Read and analyze code
uv run python main.py "Read the calculator code and explain how it works"
# Execute code
uv run python main.py "Run the calculator tests and tell me if they pass"agent-smith/
├── main.py # Entry point and agent loop
├── call_function.py # Function dispatcher
├── prompts.py # System prompt
├── config.py # Configuration constants
├── .env # Environment variables (API key)
├── tests.py # Test suite for agent
├── functions/ # Function implementations
│ ├── get_files_info.py
│ ├── get_file_content.py
│ ├── run_python_file.py
│ └── write_file.py
└── calculator/ # Example project (working directory)
├── main.py # Calculator CLI
├── tests.py # Calculator tests
└── pkg/
├── calculator.py # Calculator logic
└── render.py # Output formatting
main.py - Entry point:
- Parses CLI arguments (prompt and optional --verbose flag)
- Initializes Gemini client with API key
- Implements the agent loop (up to 20 iterations)
- Maintains conversation history across iterations
- Prints final response or error message
call_function.py - Function dispatcher:
- Defines
available_functionstool with 4 function declarations - Routes function calls to appropriate handlers in
functions/directory - Injects
WORKING_DIRfrom config for security - Returns function results in Gemini's expected format
prompts.py - System prompt:
- Instructs the AI on available operations
- Defines security model (relative paths only)
- Guides the agent's behavior
config.py - Configuration:
MAX_CHARS = 10000- File read character limitWORKING_DIR = "./calculator"- Sandboxed working directory
The agent loop in generate_content() works as follows:
- Call Gemini API with current conversation history
- Add response candidates to conversation history
- Check for function calls:
- If none: Print final response and exit
- If present: Execute all function calls
- Add function results to conversation history as user messages
- Repeat until final response or max iterations (20) reached
This allows the agent to iteratively build context and work through complex multi-step tasks.
All functions enforce security by validating paths stay within working_directory:
get_files_info(directory=None)
- Lists files in a directory with sizes and is_dir flags
- Default: lists working directory contents
- Returns file metadata in a formatted string
get_file_content(file_path)
- Reads first 10,000 characters of a file
- Appends truncation notice if file is larger
- Only accepts relative paths within working directory
run_python_file(file_path, arguments=None)
- Executes Python files with optional arguments
- 30 second timeout
- Returns STDOUT, STDERR, and exit code
- Only executes .py files
write_file(file_path, content)
- Creates or overwrites files
- Auto-creates parent directories if needed
- Validates path is within working directory
All function calls require paths relative to the working directory. The working_directory parameter is automatically injected by call_function() and validated by each function using os.path.abspath() to prevent directory traversal attacks.
GEMINI_API_KEY (required)
- Your Google Gemini API key
- Loaded from
.envfile via python-dotenv
The agent loop is limited to 20 iterations to prevent infinite loops and excessive token usage. This can be adjusted in main.py:
for _ in range(20): # Change this number to adjust the limitThe sandboxed working directory is set in config.py:
WORKING_DIR = "./calculator"Change this to point to a different directory if needed, but be cautious about filesystem access.
The repository includes a sample calculator project in ./calculator/ that the agent can work with:
calculator/main.py - CLI calculator:
- Takes mathematical expressions as arguments
- Uses
Calculatorclass for evaluation - Outputs JSON-formatted results
calculator/pkg/calculator.py - Core logic:
- Supports +, -, *, / operators
- Implements operator precedence
- Two-stack algorithm for expression evaluation
calculator/tests.py - Test suite for the calculator
$ uv run python main.py "How do I fix the calculator?"
- Calling function: get_files_info
- Calling function: get_file_content
- Calling function: run_python_file
- Calling function: get_file_content
- Calling function: write_file
- Calling function: run_python_file
I found a bug in the calculator where it wasn't handling operator precedence correctly.
I've fixed the issue in pkg/calculator.py and verified the fix by running the tests.
All tests now pass!IMPORTANT: This tool gives an LLM access to your filesystem and Python interpreter. Use with caution:
- Always work in a sandboxed directory
- Commit your changes before running the agent on important codebases
- Review any file modifications the agent suggests
- Don't give the agent access to sensitive directories
- Be aware of API rate limits and costs
- Monitor the agent's actions, especially in verbose mode
- If you used the Gemini API on the paid tier, be sure to delete your API key when you're all finished to avoid unexpected charges
You've completed the required steps, but have some fun with it! (Carefully, though... be very cautious about giving an LLM access to your filesystem and Python interpreter.) See if you can get it to:
- Fix harder and more complex bugs
- Refactor sections of code
- Add entirely new features
You can also try:
- Other LLM providers (OpenAI, Anthropic, etc.)
- Other Gemini models (gemini-2.0-pro, etc.)
- Giving it more functions to call (install packages, run git commands, etc.)
- Other codebases (commit your changes before running the agent on a codebase, so you can always revert)
Remember: What we've built is a toy version of something like Cursor/Zed's Agentic Mode, or Claude Code. Even their tools aren't perfectly secure, so be careful what you give them access to. And don't encourage anyone to use this toy agent as-is!
Rate Limit Errors:
- Gemini free tier has a limit of 5 requests per minute
- Wait ~30 seconds between tests if you hit the limit
- Consider upgrading to a paid tier for higher limits
Module Not Found:
- Ensure you're running with
uv run python main.pyto use the virtual environment - Run
uv syncto install dependencies
API Key Issues:
- Verify your
.envfile exists and containsGEMINI_API_KEY=... - Check your API key is valid at https://ai.google.dev/
This project was created as part of a Boot.dev course exercise.