An autonomous software engineering agent for solving programming tasks from the SWE-bench dataset
007 is a autonomous software engineering agent designed to solve programming tasks from the SWE-bench dataset. It leverages Large Language Models (LLMs) to understand bug reports, analyze codebases, implement fixes, and verify solutions through automated testing.
The agent operates through a multi-turn conversation paradigm where it:
- Analyzes bug reports and problem statements
- Explores codebases to understand structure and identify issues
- Implements targeted fixes using file editing tools
- Validates solutions through automated testing
- Generates git patches for submission
- Multi-LLM Support: Works with OpenAI, Anthropic, Google, Azure, AWS, and local models via LiteLLM
- Parallel Processing: Supports up to 5 concurrent workers for batch processing
- Intelligent Tool System: Modular architecture with specialized tools for file editing, testing, and evaluation
- Robust Error Handling: Comprehensive retry logic and graceful degradation
- Session Management: Organized logging and result tracking across multiple runs
- SWE-bench Integration: Full compatibility with SWE-bench evaluation harness
- Python 3.13+
- Poetry for dependency management
- Docker for SWE-bench test execution
- Git for repository operations
- ripgrep (rg) for fast file content searching
- SWE-bench CLI for test execution
brew install ripgrep poetry docker git- Clone the repository:
git clone <repository-url>
cd 007-CodingAgent- Install Python dependencies:
poetry install- Set up environment variables:
# Create .env file with your API keys
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
GOOGLE_API_KEY=your_google_key
# ... other provider keys as needed- Run the project:
# Quick test run
LOG_LEVEL=DEBUG poetry run python main.py
# or
source scripts/aliases.sh
debugTo test the agent with a specific SWE-bench task, run either:
# Using Poetry (recommended)
poetry run python main.py -i astropy__astropy-12907
# Using Python directly
python main.py -i astropy__astropy-12907
# Add LOG_LEVEL=DEBUG for more information during the usageThis will run the autonomous coding agent on the astropy__astropy-12907 task and create a session directory with results and logs for evaluation.
The project includes convenient aliases in scripts/aliases.sh that are the easy way to use the agent:
# First, source the aliases
source scripts/aliases.sh
# Basic operations
run-all # Run all tasks with 5 workers
run-task # Run single task by ID
list-tasks # List all available tasks
debug # Run with debug logging
# Continuation and retry
run-continue # Continue from latest session
run-retry # Retry failed tasks from a session
debug-retry # Retry with debug loggingYou can also run the agent manually using the full command syntax:
# Run a single task
python main.py --id astropy__astropy-12907
# Run all tasks with parallel processing
python main.py --all --workers 5
# Continue from a previous session
python main.py --all --continue --workers 5
# Retry failed tasks from a specific session
python main.py --retry path/to/all_results.json --workers 5
# List all available tasks
python main.py --list
# Enable debug logging
LOG_LEVEL=DEBUG python main.py --id astropy__astropy-12907Default model: gemini/gemini-2.5-pro-preview-06-05
Supported models are specified by LiteLLM configuration.
# LLM Provider Keys
OPENAI_API_KEY=your_key
ANTHROPIC_API_KEY=your_key
GOOGLE_API_KEY=your_key
# Logging
LOG_LEVEL=DEBUG # DEBUG, INFO, WARNING, ERROR
# LiteLLM Configuration
LITELLM_LOG=ERROR
LITELLM_DEBUG=FalseThe agent organizes runs into timestamped sessions:
- single: Single task execution
- full: Complete dataset run
- full_mp: Parallel full run
- retry: Failed task retry
- retry_mp: Parallel retry
# Run all tests
poetry run pytest
# Run specific test file
poetry run pytest tests/test_file_editor.py
- FileEditor Tests: File operations and editing logic
- Error Handling: Comprehensive known failure testing
# Full debug output
LOG_LEVEL=DEBUG poetry run python main.py --id task_id
# Debug specific session
debug-retry path/to/session/all_results.jsonA list of succesfull issues can be found under /run_results
This project is licensed under the CC BY-NC 4.0 License - see the LICENSE file for details.
- SWE-bench: For providing the evaluation dataset and harness
- LiteLLM: For unified LLM provider interface
- HuggingFace: For dataset hosting and management
- OpenAI, Anthropic, Google: For LLM API access
Note: This is a research project for bachelor thesis. The agent is designed for educational and research purposes only.