Skip to content
/ 007 Public

An autonomous software engineering agent for solving programming tasks from the SWE-bench dataset

License

Notifications You must be signed in to change notification settings

lv042/007

Repository files navigation

007-Coding-Agent

An autonomous software engineering agent for solving programming tasks from the SWE-bench dataset

Python 3.13+ License: CC BY-NC 4.0 SWE-bench Docker Tested with pytest LiteLLM

Overview

007 is a autonomous software engineering agent designed to solve programming tasks from the SWE-bench dataset. It leverages Large Language Models (LLMs) to understand bug reports, analyze codebases, implement fixes, and verify solutions through automated testing.

The agent operates through a multi-turn conversation paradigm where it:

  • Analyzes bug reports and problem statements
  • Explores codebases to understand structure and identify issues
  • Implements targeted fixes using file editing tools
  • Validates solutions through automated testing
  • Generates git patches for submission

Key Features

  • Multi-LLM Support: Works with OpenAI, Anthropic, Google, Azure, AWS, and local models via LiteLLM
  • Parallel Processing: Supports up to 5 concurrent workers for batch processing
  • Intelligent Tool System: Modular architecture with specialized tools for file editing, testing, and evaluation
  • Robust Error Handling: Comprehensive retry logic and graceful degradation
  • Session Management: Organized logging and result tracking across multiple runs
  • SWE-bench Integration: Full compatibility with SWE-bench evaluation harness

Requirements

  • Python 3.13+
  • Poetry for dependency management
  • Docker for SWE-bench test execution
  • Git for repository operations
  • ripgrep (rg) for fast file content searching
  • SWE-bench CLI for test execution

Installation

System Dependencies (macOS)

brew install ripgrep poetry docker git

Project Setup

  1. Clone the repository:
git clone <repository-url>
cd 007-CodingAgent
  1. Install Python dependencies:
poetry install
  1. Set up environment variables:
# Create .env file with your API keys
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
GOOGLE_API_KEY=your_google_key
# ... other provider keys as needed
  1. Run the project:
# Quick test run
LOG_LEVEL=DEBUG poetry run python main.py
# or
source scripts/aliases.sh
debug

Usage

Example Task Execution

To test the agent with a specific SWE-bench task, run either:

# Using Poetry (recommended)
poetry run python main.py -i astropy__astropy-12907

# Using Python directly
python main.py -i astropy__astropy-12907

# Add LOG_LEVEL=DEBUG for more information during the usage

This will run the autonomous coding agent on the astropy__astropy-12907 task and create a session directory with results and logs for evaluation.

Using Project Aliases

The project includes convenient aliases in scripts/aliases.sh that are the easy way to use the agent:

# First, source the aliases
source scripts/aliases.sh

# Basic operations
run-all          # Run all tasks with 5 workers 
run-task         # Run single task by ID
list-tasks       # List all available tasks
debug            # Run with debug logging

# Continuation and retry
run-continue     # Continue from latest session
run-retry        # Retry failed tasks from a session
debug-retry      # Retry with debug logging

Manual Command Line Interface

You can also run the agent manually using the full command syntax:

# Run a single task
python main.py --id astropy__astropy-12907

# Run all tasks with parallel processing
python main.py --all --workers 5

# Continue from a previous session
python main.py --all --continue --workers 5

# Retry failed tasks from a specific session
python main.py --retry path/to/all_results.json --workers 5

# List all available tasks
python main.py --list

# Enable debug logging
LOG_LEVEL=DEBUG python main.py --id astropy__astropy-12907

Model Configuration

Default model: gemini/gemini-2.5-pro-preview-06-05

Supported models are specified by LiteLLM configuration.

Environment Variables

# LLM Provider Keys
OPENAI_API_KEY=your_key
ANTHROPIC_API_KEY=your_key
GOOGLE_API_KEY=your_key

# Logging
LOG_LEVEL=DEBUG  # DEBUG, INFO, WARNING, ERROR

# LiteLLM Configuration
LITELLM_LOG=ERROR
LITELLM_DEBUG=False

Session Management

The agent organizes runs into timestamped sessions:

Session Types

  • single: Single task execution
  • full: Complete dataset run
  • full_mp: Parallel full run
  • retry: Failed task retry
  • retry_mp: Parallel retry

Running Tests

# Run all tests
poetry run pytest

# Run specific test file
poetry run pytest tests/test_file_editor.py

Key Test Categories

  • FileEditor Tests: File operations and editing logic
  • Error Handling: Comprehensive known failure testing

Evaluation

Debug Commands

# Full debug output
LOG_LEVEL=DEBUG poetry run python main.py --id task_id

# Debug specific session
debug-retry path/to/session/all_results.json

Results

A list of succesfull issues can be found under /run_results

License

This project is licensed under the CC BY-NC 4.0 License - see the LICENSE file for details.

Acknowledgments

  • SWE-bench: For providing the evaluation dataset and harness
  • LiteLLM: For unified LLM provider interface
  • HuggingFace: For dataset hosting and management
  • OpenAI, Anthropic, Google: For LLM API access

Note: This is a research project for bachelor thesis. The agent is designed for educational and research purposes only.

About

An autonomous software engineering agent for solving programming tasks from the SWE-bench dataset

Resources

License

Stars

Watchers

Forks

Packages

No packages published