Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
214 changes: 60 additions & 154 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,169 +1,75 @@
# ai-agent

Simple developer documentation for the current state of the project.

---

## Ownership & project vision

**I own the implementation** of a **custom AI Agent CLI** with the following design:

- **Accepts coding tasks** — You give the agent a task (e.g. “find all Python files in this folder” or “summarize the layout of the project”).
- **Uses predefined functions** — The agent chooses from a fixed set of tools, such as:
- Scanning/list files in a directory (e.g. `get_files_info`),
- Reading file content with truncation (e.g. `get_file_content`),
- and other functions as they are added.
- **Runs until done or stopped** — The agent calls the Gemini API, selects functions, runs them, and repeats until the task is complete or the run is interrupted/fails.
- **Powered by Gemini API** — All reasoning and tool choice go through Google’s Gemini model.

The code in this repository is my implementation of that agent and its tools. The **calculator** directory is an **example code repo** that runs on this CLI (code the agent can work with, run, or reason about). The file-scanning utility and other predefined functions are tools the agent uses; both are documented below.

---
# coding_ai_agent

## Overview

This repo currently contains:

- **Agent CLI** (`main.py`) — Entrypoint that talks to the Gemini API. Right now it sends a single user prompt and prints the model response; the full agent loop (task → choose function → execute → repeat) is the intended evolution.
- **Example code repo** — **Calculator** (`calculator/`) — An example codebase that runs on this CLI. The agent can run it, inspect it, or perform coding tasks against it (e.g. run tests, modify code). It provides a small infix calculator CLI and JSON output.
- **Predefined tools** — Implemented for the agent to call:
- **Directory listing** (`functions/get_files_info.py`) — `get_files_info`: lists directory contents (names, sizes, is_dir) with path-safety so the agent cannot escape the allowed working directory.
- **File content** (`functions/get_files_content.py`) — `get_file_content`: reads file content up to a character limit (`CHARACTER_LIMIT` in `config.py`), with a truncation message when the file exceeds it.

Python 3.13+, managed with **uv** (see `pyproject.toml` and `uv.lock`).

---

## Project layout

```
ai-agent/
├── main.py # AI Agent CLI entrypoint (Gemini API)
├── pyproject.toml # Project metadata and dependencies
├── pyrightconfig.json # Type-checker extra paths
├── .env # GEMINI_API_KEY (not committed)
├── .gitignore
├── calculator/ # Example code repo (runs on this CLI)
│ ├── main.py # CLI: expression from argv → JSON result
│ ├── tests.py # unittest for Calculator
│ └── pkg/
│ ├── calculator.py # Infix expression evaluator (+, -, *, /)
│ └── render.py # JSON formatting of expression + result
└── functions/
├── get_files_info.py # List dir contents; respects working dir boundary
└── get_files_content.py # Read file content with truncation; path-safe
A CLI-based AI coding agent powered by Google Gemini (gemini-2.5-flash). Accepts a natural language task, calls predefined tools against a target codebase in an iterative loop, and returns a final response when done.

- Solves the problem of automating coding tasks (inspect, read, run, write) via LLM-driven tool use.
- Implements a ReAct-style agent loop: prompt → model → tool calls → results → repeat.
- All reasoning and tool selection delegated to the Gemini API; no hand-written planning logic.

## Author / Ownership

Designed and implemented by the repository owner.

- **Architectural decisions**: fixed tool set exposed to the model via Gemini function declarations; working directory injected server-side so the model never handles raw paths.
- **System design ownership**: agent loop, tool dispatch (`call_functions.py`), path-safety boundaries in each tool, and system prompt strategy are all custom-built.
- **Experimentation**: uses `temperature=0` for deterministic tool selection; `MAX_ITERS` cap prevents infinite loops; `CHARACTER_LIMIT` prevents context blowout from large files.

## Core Idea

- **Agent loop** (`main.py`): iterates up to `MAX_ITERS` (20) rounds. Each round: send messages → get model response → if function calls present, execute them and append results → otherwise print final text and exit.
- **Prompt strategy** (`prompts.py`): system prompt instructs the model to plan tool calls for any coding task; all paths must be relative (working directory auto-injected).
- **Tool integration** (`call_functions.py`): four tools registered as Gemini `FunctionDeclaration`s:
- `get_files_info` — list directory contents (name, size, is_dir); path-safe.
- `get_file_content` — read file up to `CHARACTER_LIMIT` (10,000 chars) with truncation notice.
- `run_python_file` — execute a Python script with optional CLI args; returns stdout/stderr.
- `write_file` — write or overwrite a file within the working directory.
- **Working directory isolation**: every tool call has `working_directory="./calculator"` injected by the dispatcher; tools raise on any path that escapes the boundary.
- **State / memory**: single-session message history list (`messages`); no persistent memory across runs.
- **Example target codebase** (`calculator/`): small infix calculator CLI used as the agent's default working directory during development and testing.

## Architecture

```mermaid
flowchart TD
User["User\npython main.py <prompt>"] --> Init["Build initial message\n+ load system prompt"]
Init --> Loop["Agent Loop\n(max 20 iters)"]
Loop --> Gemini["Gemini API\ngemini-2.5-flash\ntemp=0 + tool declarations"]
Gemini -->|function_calls present| Dispatch["call_function dispatcher\ninject working_directory"]
Dispatch --> T1["get_files_info\nlist dir (path-safe)"]
Dispatch --> T2["get_file_content\nread file ≤10k chars"]
Dispatch --> T3["run_python_file\nexecute script + args"]
Dispatch --> T4["write_file\nwrite/overwrite file"]
T1 & T2 & T3 & T4 --> Results["Append tool results\nto message history"]
Results --> Loop
Gemini -->|no function_calls| Output["Print final response\nExit"]
```

Root-level tests (run from repo root):

- `test_get_files_info.py` — tests for `get_files_info`.
- `test_get_file_content.py` — tests for `get_file_content`.

---

## Setup

1. **Environment**

- Create a venv (e.g. `uv venv` → `.venv`) and install:
`uv sync`
- Copy or create `.env` with:
- `GEMINI_API_KEY` — required for `main.py` (Gemini client).

2. **Dependencies** (from `pyproject.toml`)

- `google-genai` — Gemini API client
- `python-dotenv` — load `.env` for `main.py`
```bash
# install deps (requires uv)
uv sync

Optional: use the project’s `.venv` (ignored by git).

---

## Running

- **Gemini CLI** (from repo root):
```bash
python main.py "Your prompt here"
python main.py "Your prompt" --verbose # show token usage
```
# set API key
echo "GEMINI_API_KEY=your_key_here" > .env
```

- **Calculator** — Example code repo that runs on this CLI (from repo root):
```bash
python calculator/main.py "3 + 5"
python calculator/main.py "2 * 3 - 8 / 2 + 5"
```
Output is JSON: `{"expression": "...", "result": <number>}`. The agent can execute this code or work on it as part of a coding task.
Dependencies: `google-genai`, `python-dotenv`. Python 3.13+.

- **Directory listing** (programmatic):
- `get_files_info(working_directory, directory=".")`
- Returns a string of lines: `name : file_size=<bytes>, is_dir=<bool>`.
- Restricts listing to paths under `working_directory`; raises if `directory` is outside it or not a directory.
## Usage

- **File content** (programmatic):
- `get_file_content(working_directory, file_path)`
- Returns file content up to `CHARACTER_LIMIT` characters; appends a truncation message if the file is longer.
- Path must be under `working_directory`; raises if outside, not found, or not a regular file.
```bash
# run the agent with a task
python main.py "explain the structure of the calculator project"
python main.py "add a modulo operator to the calculator" --verbose
```

---
`--verbose` prints token usage and raw function response payloads per iteration.

## Testing

- **Calculator** (from repo root):
```bash
python -m pytest calculator/tests.py -v
# or
python calculator/tests.py
```
Covers: addition, subtraction, multiplication, division, operator precedence, empty expression, invalid operator, and not enough operands.

- **get_files_info** (from repo root):
```bash
python -m pytest test_get_files_info.py -v
# or
python test_get_files_info.py
```
Covers: current dir, subdir (`pkg`), and rejection of `/bin` and `../` (outside permitted working directory); nonexistent path and file-as-path raise.

- **get_file_content** (from repo root):
```bash
python -m pytest test_get_file_content.py -v
# or
python test_get_file_content.py
```
Covers: truncation when file exceeds limit, short file and nested path return content below limit, path outside working directory raises, file not found raises.

---

## Implementation notes

- **Calculator** (example code repo)
- Tokenizes on spaces; supports `+`, `-`, `*`, `/` with standard precedence (\*, / over +, -).
- Uses a stack-based infix evaluator; raises `ValueError` on invalid tokens or malformed expressions.
- Lives under `calculator/` as the example codebase that runs on the CLI.

- **get_files_info**
- Takes absolute/normalized paths and ensures the target directory is under `working_directory` (no escaping outside the allowed root).
- Returns a string of entry lines; errors are raised as exceptions. Skips `.` and `..` in the listing.

- **get_file_content**
- Resolves `file_path` under `working_directory`, reads up to `CHARACTER_LIMIT` characters, and appends a truncation message if the file is longer.
- Raises if the path is outside the working directory, the file is not found, or it is not a regular file.

- **Type checking**
- `pyrightconfig.json` sets `extraPaths: ["."]` so imports like `calculator.pkg`, `functions.get_files_info`, and `functions.get_files_content` resolve from the project root.

---

## Current state summary

| Area | Status |
|-----------------|--------|
| Agent vision | Custom AI Agent CLI: accept coding task → choose predefined functions (e.g. scan files) → run until complete or interrupted; Gemini API. |
| Agent entrypoint| `main.py` — Gemini client working; full loop (tool choice + execution + repeat) in progress. |
| Predefined tools | `get_files_info` (directory listing) and `get_file_content` (file reading with truncation) implemented and tested; path safety in place. |
| Example code repo | `calculator/` — Example codebase that runs on the CLI; agent can run/inspect it. |
| Packaging | `pyproject.toml` + `uv.lock`; no `scripts` entrypoints. |

No web UI or long-running services; everything is CLI or library-style.
```bash
python -m pytest test_get_files_info.py test_get_file_content.py test_run_python_file.py test_write_file.py calculator/tests.py -v
```