007-Coding-Agent

An autonomous software engineering agent for solving programming tasks from the SWE-bench dataset

Overview

007 is a autonomous software engineering agent designed to solve programming tasks from the SWE-bench dataset. It leverages Large Language Models (LLMs) to understand bug reports, analyze codebases, implement fixes, and verify solutions through automated testing.

The agent operates through a multi-turn conversation paradigm where it:

Analyzes bug reports and problem statements
Explores codebases to understand structure and identify issues
Implements targeted fixes using file editing tools
Validates solutions through automated testing
Generates git patches for submission

Key Features

Multi-LLM Support: Works with OpenAI, Anthropic, Google, Azure, AWS, and local models via LiteLLM
Parallel Processing: Supports up to 5 concurrent workers for batch processing
Intelligent Tool System: Modular architecture with specialized tools for file editing, testing, and evaluation
Robust Error Handling: Comprehensive retry logic and graceful degradation
Session Management: Organized logging and result tracking across multiple runs
SWE-bench Integration: Full compatibility with SWE-bench evaluation harness

Requirements

Python 3.13+
Poetry for dependency management
Docker for SWE-bench test execution
Git for repository operations
ripgrep (rg) for fast file content searching
SWE-bench CLI for test execution

Installation

System Dependencies (macOS)

brew install ripgrep poetry docker git

Project Setup

Clone the repository:

git clone <repository-url>
cd 007-CodingAgent

Install Python dependencies:

poetry install

Set up environment variables:

# Create .env file with your API keys
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
GOOGLE_API_KEY=your_google_key
# ... other provider keys as needed

Run the project:

# Quick test run
LOG_LEVEL=DEBUG poetry run python main.py
# or
source scripts/aliases.sh
debug

Usage

Example Task Execution

To test the agent with a specific SWE-bench task, run either:

# Using Poetry (recommended)
poetry run python main.py -i astropy__astropy-12907

# Using Python directly
python main.py -i astropy__astropy-12907

# Add LOG_LEVEL=DEBUG for more information during the usage

This will run the autonomous coding agent on the astropy__astropy-12907 task and create a session directory with results and logs for evaluation.

Using Project Aliases

The project includes convenient aliases in scripts/aliases.sh that are the easy way to use the agent:

# First, source the aliases
source scripts/aliases.sh

# Basic operations
run-all          # Run all tasks with 5 workers 
run-task         # Run single task by ID
list-tasks       # List all available tasks
debug            # Run with debug logging

# Continuation and retry
run-continue     # Continue from latest session
run-retry        # Retry failed tasks from a session
debug-retry      # Retry with debug logging

Manual Command Line Interface

You can also run the agent manually using the full command syntax:

# Run a single task
python main.py --id astropy__astropy-12907

# Run all tasks with parallel processing
python main.py --all --workers 5

# Continue from a previous session
python main.py --all --continue --workers 5

# Retry failed tasks from a specific session
python main.py --retry path/to/all_results.json --workers 5

# List all available tasks
python main.py --list

# Enable debug logging
LOG_LEVEL=DEBUG python main.py --id astropy__astropy-12907

Model Configuration

Default model: gemini/gemini-2.5-pro-preview-06-05

Supported models are specified by LiteLLM configuration.

Environment Variables

# LLM Provider Keys
OPENAI_API_KEY=your_key
ANTHROPIC_API_KEY=your_key
GOOGLE_API_KEY=your_key

# Logging
LOG_LEVEL=DEBUG  # DEBUG, INFO, WARNING, ERROR

# LiteLLM Configuration
LITELLM_LOG=ERROR
LITELLM_DEBUG=False

Session Management

The agent organizes runs into timestamped sessions:

Session Types

single: Single task execution
full: Complete dataset run
full_mp: Parallel full run
retry: Failed task retry
retry_mp: Parallel retry

Running Tests

# Run all tests
poetry run pytest

# Run specific test file
poetry run pytest tests/test_file_editor.py

Key Test Categories

FileEditor Tests: File operations and editing logic
Error Handling: Comprehensive known failure testing

Evaluation

Debug Commands

# Full debug output
LOG_LEVEL=DEBUG poetry run python main.py --id task_id

# Debug specific session
debug-retry path/to/session/all_results.json

Results

A list of succesfull issues can be found under /run_results

License

This project is licensed under the CC BY-NC 4.0 License - see the LICENSE file for details.

Acknowledgments

SWE-bench: For providing the evaluation dataset and harness
LiteLLM: For unified LLM provider interface
HuggingFace: For dataset hosting and management
OpenAI, Anthropic, Google: For LLM API access

Note: This is a research project for bachelor thesis. The agent is designed for educational and research purposes only.

Name		Name	Last commit message	Last commit date
Latest commit History 136 Commits
run_results		run_results
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

007-Coding-Agent

Overview

Key Features

Requirements

Installation

System Dependencies (macOS)

Project Setup

Usage

Example Task Execution

Using Project Aliases

Manual Command Line Interface

Model Configuration

Environment Variables

Session Management

Session Types

Running Tests

Key Test Categories

Evaluation

Debug Commands

Results

License

Acknowledgments

About

Uh oh!

Releases 2

Packages

Languages

License

lv042/007

Folders and files

Latest commit

History

Repository files navigation

007-Coding-Agent

Overview

Key Features

Requirements

Installation

System Dependencies (macOS)

Project Setup

Usage

Example Task Execution

Using Project Aliases

Manual Command Line Interface

Model Configuration

Environment Variables

Session Management

Session Types

Running Tests

Key Test Categories

Evaluation

Debug Commands

Results

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages