Skip to content

An AI coding agent CLI tool that can autonomously read, write, and execute Python files within a sandboxed working directory. Built with Python and Google's Gemini API.

Notifications You must be signed in to change notification settings

jasonbland/agent-smith

Repository files navigation

Agent Smith

An AI coding agent CLI tool that can autonomously read, write, and execute Python files within a sandboxed working directory. Built with Python and Google's Gemini API.

Overview

Agent Smith is an AI agent that uses function calling to interact with a filesystem. It can iteratively work through complex multi-step tasks by:

  • Listing and exploring files in a directory
  • Reading file contents
  • Executing Python files with arguments
  • Writing or modifying files
  • Building context across multiple iterations to solve problems

The agent operates within a sandboxed working directory (./calculator) to ensure security.

Features

  • Agent Loop: Iteratively calls the LLM up to 20 times, maintaining conversation history to work through complex tasks
  • Function Calling: Four core functions for filesystem and code execution operations
  • Conversation History: Full context maintained across iterations for intelligent decision-making
  • Sandboxed Execution: All operations confined to a specified working directory
  • Verbose Mode: Optional detailed logging of token usage and function calls

Prerequisites

  • Python 3.10 or higher
  • uv package manager
  • Google Gemini API key

Setup

  1. Clone the repository:

    git clone <repository-url>
    cd agent-smith
  2. Install dependencies:

    uv sync
  3. Set up environment variables: Create a .env file in the project root:

    GEMINI_API_KEY=your_api_key_here

Usage

Basic Usage

uv run python main.py "your prompt here"

Verbose Mode

Get detailed output including token counts and function call details:

uv run python main.py "your prompt here" --verbose

Example Prompts

# Ask questions about the code
uv run python main.py "How does the calculator render results to the console?"

# List and explore files
uv run python main.py "List all Python files in the working directory"

# Fix bugs
uv run python main.py "Fix the calculator and run the tests"

# Read and analyze code
uv run python main.py "Read the calculator code and explain how it works"

# Execute code
uv run python main.py "Run the calculator tests and tell me if they pass"

Architecture

Directory Structure

agent-smith/
├── main.py                 # Entry point and agent loop
├── call_function.py        # Function dispatcher
├── prompts.py             # System prompt
├── config.py              # Configuration constants
├── .env                   # Environment variables (API key)
├── tests.py               # Test suite for agent
├── functions/             # Function implementations
│   ├── get_files_info.py
│   ├── get_file_content.py
│   ├── run_python_file.py
│   └── write_file.py
└── calculator/            # Example project (working directory)
    ├── main.py           # Calculator CLI
    ├── tests.py          # Calculator tests
    └── pkg/
        ├── calculator.py # Calculator logic
        └── render.py     # Output formatting

Core Components

main.py - Entry point:

  • Parses CLI arguments (prompt and optional --verbose flag)
  • Initializes Gemini client with API key
  • Implements the agent loop (up to 20 iterations)
  • Maintains conversation history across iterations
  • Prints final response or error message

call_function.py - Function dispatcher:

  • Defines available_functions tool with 4 function declarations
  • Routes function calls to appropriate handlers in functions/ directory
  • Injects WORKING_DIR from config for security
  • Returns function results in Gemini's expected format

prompts.py - System prompt:

  • Instructs the AI on available operations
  • Defines security model (relative paths only)
  • Guides the agent's behavior

config.py - Configuration:

  • MAX_CHARS = 10000 - File read character limit
  • WORKING_DIR = "./calculator" - Sandboxed working directory

Agent Loop

The agent loop in generate_content() works as follows:

  1. Call Gemini API with current conversation history
  2. Add response candidates to conversation history
  3. Check for function calls:
    • If none: Print final response and exit
    • If present: Execute all function calls
  4. Add function results to conversation history as user messages
  5. Repeat until final response or max iterations (20) reached

This allows the agent to iteratively build context and work through complex multi-step tasks.

Available Functions

All functions enforce security by validating paths stay within working_directory:

get_files_info(directory=None)

  • Lists files in a directory with sizes and is_dir flags
  • Default: lists working directory contents
  • Returns file metadata in a formatted string

get_file_content(file_path)

  • Reads first 10,000 characters of a file
  • Appends truncation notice if file is larger
  • Only accepts relative paths within working directory

run_python_file(file_path, arguments=None)

  • Executes Python files with optional arguments
  • 30 second timeout
  • Returns STDOUT, STDERR, and exit code
  • Only executes .py files

write_file(file_path, content)

  • Creates or overwrites files
  • Auto-creates parent directories if needed
  • Validates path is within working directory

Security Model

All function calls require paths relative to the working directory. The working_directory parameter is automatically injected by call_function() and validated by each function using os.path.abspath() to prevent directory traversal attacks.

Configuration

Environment Variables

GEMINI_API_KEY (required)

  • Your Google Gemini API key
  • Loaded from .env file via python-dotenv

Iteration Limit

The agent loop is limited to 20 iterations to prevent infinite loops and excessive token usage. This can be adjusted in main.py:

for _ in range(20):  # Change this number to adjust the limit

Working Directory

The sandboxed working directory is set in config.py:

WORKING_DIR = "./calculator"

Change this to point to a different directory if needed, but be cautious about filesystem access.

Example: Calculator Project

The repository includes a sample calculator project in ./calculator/ that the agent can work with:

calculator/main.py - CLI calculator:

  • Takes mathematical expressions as arguments
  • Uses Calculator class for evaluation
  • Outputs JSON-formatted results

calculator/pkg/calculator.py - Core logic:

  • Supports +, -, *, / operators
  • Implements operator precedence
  • Two-stack algorithm for expression evaluation

calculator/tests.py - Test suite for the calculator

Example Session

$ uv run python main.py "How do I fix the calculator?"
 - Calling function: get_files_info
 - Calling function: get_file_content
 - Calling function: run_python_file
 - Calling function: get_file_content
 - Calling function: write_file
 - Calling function: run_python_file

I found a bug in the calculator where it wasn't handling operator precedence correctly.
I've fixed the issue in pkg/calculator.py and verified the fix by running the tests.
All tests now pass!

Safety Considerations

IMPORTANT: This tool gives an LLM access to your filesystem and Python interpreter. Use with caution:

  • Always work in a sandboxed directory
  • Commit your changes before running the agent on important codebases
  • Review any file modifications the agent suggests
  • Don't give the agent access to sensitive directories
  • Be aware of API rate limits and costs
  • Monitor the agent's actions, especially in verbose mode
  • If you used the Gemini API on the paid tier, be sure to delete your API key when you're all finished to avoid unexpected charges

Extending the Project

You've completed the required steps, but have some fun with it! (Carefully, though... be very cautious about giving an LLM access to your filesystem and Python interpreter.) See if you can get it to:

  • Fix harder and more complex bugs
  • Refactor sections of code
  • Add entirely new features

You can also try:

  • Other LLM providers (OpenAI, Anthropic, etc.)
  • Other Gemini models (gemini-2.0-pro, etc.)
  • Giving it more functions to call (install packages, run git commands, etc.)
  • Other codebases (commit your changes before running the agent on a codebase, so you can always revert)

Remember: What we've built is a toy version of something like Cursor/Zed's Agentic Mode, or Claude Code. Even their tools aren't perfectly secure, so be careful what you give them access to. And don't encourage anyone to use this toy agent as-is!

Troubleshooting

Rate Limit Errors:

  • Gemini free tier has a limit of 5 requests per minute
  • Wait ~30 seconds between tests if you hit the limit
  • Consider upgrading to a paid tier for higher limits

Module Not Found:

  • Ensure you're running with uv run python main.py to use the virtual environment
  • Run uv sync to install dependencies

API Key Issues:

  • Verify your .env file exists and contains GEMINI_API_KEY=...
  • Check your API key is valid at https://ai.google.dev/

License

This project was created as part of a Boot.dev course exercise.

About

An AI coding agent CLI tool that can autonomously read, write, and execute Python files within a sandboxed working directory. Built with Python and Google's Gemini API.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages