SwiftSolve: Multi-Agent Code Generation with Efficiency Optimization

SwiftSolve is a multi-agent framework that synthesizes functionally correct and computationally efficient C++ code from natural language problem statements. Unlike traditional code generation systems that focus solely on correctness, SwiftSolve co-optimizes for both functional correctness and Big-O efficiency through an iterative feedback loop between planning, coding, profiling, and analysis agents.

Quick Start

Prerequisites

Python 3.12+ (required for modern type hints and performance)
API Keys: OpenAI GPT-4 and Anthropic Claude API access

1. Clone and Setup

# Clone the repository

#Then,

cd swiftsolve

# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

2. Environment Configuration

# Set required environment variables
export OPENAI_API_KEY="your_openai_api_key_here"
export ANTHROPIC_API_KEY="your_anthropic_api_key_here"

# Set Python path for imports
export PYTHONPATH="${PWD}/src"

3. Verify Installation

# Test basic functionality with a simple problem
python src/swiftsolve/main.py --task_json src/swiftsolve/test.json

Architecture Overview

SwiftSolve employs a sophisticated multi-agent pipeline that iteratively refines code solutions:

graph TD
    A[Natural Language Problem] --> B[Planner Agent<br/>Claude 4 Opus]
    B --> C[Static Pruner<br/>Efficiency Heuristics]
    C --> D[Coder Agent<br/>GPT-4.1]
    D --> E[Profiler<br/> Sandbox]
    E --> F[Analyst Agent<br/>Complexity Analysis]
    F --> G{Solution<br/>Acceptable?}
    G -->|No| H[Feedback Loop]
    H --> D
    G -->|Yes| I[Final Solution<br/>+ Performance Profile]

Core Components

Planner Agent (Claude 4 Opus): Converts natural language into structured algorithmic plans
Static Pruner: Filters out obviously inefficient approaches using regex and AST analysis
Coder Agent (GPT-4.1): Generates C++ code from approved plans
Profiler: Compiles and benchmarks code with real performance measurements
Analyst Agent: Evaluates efficiency using heuristics and LLM fallback for complex cases

Usage Modes

1. API Server Mode (Recommended for Research)

Start the FastAPI server for programmatic access:

# Start the server
PYTHONPATH=src uvicorn swiftsolve.main:app --host 127.0.0.1 --port 8000 --reload

# Server will be available at http://localhost:8000
# API documentation at http://localhost:8000/docs

Solve Individual Problems

curl -X POST "http://localhost:8000/solve" \
  -H "Content-Type: application/json" \
  -d '{
    "task_id": "example_1",
    "prompt": "Find the maximum element in an array",
    "constraints": {"runtime_limit": 2000, "memory_limit": 512},
    "unit_tests": [
      {"input": "5\n3 1 4 1 5", "output": "5"},
      {"input": "3\n10 20 30", "output": "30"}
    ]
  }'

Research Evaluation (Full Benchmark)

# Full evaluation: 225 iterations (25 tasks × 3 seeds × 3 replans)
curl -X POST "http://localhost:8000/research/evaluate" \
  -H "Content-Type: application/json" \
  -d '{
    "seeds": [42, 123, 456],
    "max_workers": 4,
    "output_dir": "research_results"
  }'

Mini Evaluation (Quick Testing)

# Mini evaluation: 18 iterations (2 tasks × 3 seeds × 3 replans)
curl -X POST "http://localhost:8000/mini-research/evaluate" \
  -H "Content-Type: application/json" \
  -d '{
    "seeds": [42, 123, 456],
    "max_workers": 2,
    "output_dir": "mini_results"
  }'

2. CLI Mode

For direct command-line usage:

# Solve a single problem from JSON file
python src/swiftsolve/main.py --task_json datasets/bigobench/task_bigobench_1718.json

# Run batch evaluation
python -m src.swiftsolve.evaluation.batch_runner --benchmark --seeds 42 123 456

3. Testing and Development

# Test mini-run functionality
python test_mini_run.py

# Run built-in test suite
python dry_run_batch.py --tasks 5

# Run unit tests
python -m pytest src/swiftsolve/tests/

Evaluation and Benchmarking

Available Datasets

BigO(Bench): 16 algorithmic tasks with known complexity requirements
Codeforces: 10 competitive programming problems with strict constraints

Evaluation Metrics

pass@k: Functional correctness across k attempts
eff@k_runtime: Efficiency-optimized success rate for runtime constraints
eff@k_memory: Efficiency-optimized success rate for memory constraints
TLE/MLE rates: Time/Memory limit exceeded frequencies

Running Evaluations

Mode	Tasks	Iterations	Duration	Use Case
Mini-Run	2 tasks	18 iterations	~5 minutes	Setup testing, development
Full Run	25 tasks	225 iterations	~60 minutes	Research benchmarking

# Quick setup verification
python test_mini_run.py

# Full research evaluation
curl -X POST "http://localhost:8000/research/evaluate" \
  -H "Content-Type: application/json" \
  -d '{"max_workers": 4}'

🔧 Advanced Configuration

Environment Variables

Variable	Purpose	Required
`OPENAI_API_KEY`	GPT-4.1 access	Yes
`ANTHROPIC_API_KEY`	Claude 4 Opus access	Yes
`LOG_LEVEL`	Logging verbosity (DEBUG/INFO/WARNING/ERROR)	No
`PYTHONPATH`	Python import path	Yes

Code Execution Environment

SwiftSolve compiles and executes C++ code directly on your system with resource limits:

# Verify C++ compiler is available
g++ --version
# or on macOS with Xcode
clang++ --version

Performance Tuning

# Adjust worker count based on your system
export MAX_WORKERS=4  # Default: 4

# Enable debug logging for troubleshooting
export LOG_LEVEL=DEBUG

📁 Project Structure

swiftsolve/
├── src/swiftsolve/           # Core framework
│   ├── agents/              # Multi-agent components
│   │   ├── planner.py       # Claude 4 Opus planning
│   │   ├── coder.py         # GPT-4.1 code generation
│   │   ├── profiler.py      # Performance profiling
│   │   └── analyst.py       # Efficiency analysis
│   ├── api/                 # FastAPI endpoints
│   ├── controller/          # Pipeline orchestration
│   ├── evaluation/          # Benchmarking infrastructure
│   ├── sandbox/             # Code execution environment
│   └── schemas/             # Data models and validation
├── datasets/                # Evaluation datasets
│   ├── bigobench/          # BigO(Bench) tasks
│   └── codeforces/         # Codeforces problems
├── baseline_test_api/       # Baseline evaluation results
├── test_*.py               # Test scripts
└── requirements.txt        # Dependencies

Testing Your Setup

1. Basic Functionality Test

# Test single problem solving
python src/swiftsolve/main.py --task_json src/swiftsolve/test.json

2. API Server Test

# Start server
PYTHONPATH=src uvicorn swiftsolve.main:app --host 127.0.0.1 --port 8000 &

# Test health endpoint
curl http://localhost:8000/healthz

# Test solve endpoint
curl -X POST "http://localhost:8000/solve" \
  -H "Content-Type: application/json" \
  -d '{"task_id": "test", "prompt": "Add two numbers", "constraints": {"runtime_limit": 1000}, "unit_tests": [{"input": "2 3", "output": "5"}]}'

3. Mini Evaluation Test

# Run mini evaluation (18 iterations, ~5 minutes)
python test_mini_run.py

Troubleshooting

Common Issues

1. Import Errors

# Ensure PYTHONPATH is set correctly
export PYTHONPATH="${PWD}/src"

2. C++ Compiler Issues

# On macOS, install Xcode command line tools
xcode-select --install

# On Ubuntu/Debian
sudo apt-get install build-essential

# On CentOS/RHEL
sudo yum groupinstall "Development Tools"

3. API Key Issues

# Verify API keys are set
echo $OPENAI_API_KEY
echo $ANTHROPIC_API_KEY

4. Port Already in Use

# Use different port
PYTHONPATH=src uvicorn swiftsolve.main:app --host 127.0.0.1 --port 8001

5. Memory Issues During Evaluation

# Reduce worker count
curl -X POST "http://localhost:8000/research/evaluate" \
  -H "Content-Type: application/json" \
  -d '{"max_workers": 2}'

Debug Mode

Enable detailed logging for troubleshooting:

export LOG_LEVEL=DEBUG
PYTHONPATH=src uvicorn swiftsolve.main:app --host 127.0.0.1 --port 8000

Performance Expectations

Typical Performance Metrics

Mini-Run (18 iterations): ~5 minutes
Full Evaluation (225 iterations): ~60 minutes
Single Problem: ~15-30 seconds
Success Rate: 60-80% (varies by dataset and constraints)

Resource Requirements

CPU: 4+ cores recommended for parallel evaluation
RAM: 8GB+ recommended
Storage: 2GB+ for datasets and results
Network: Stable internet for API calls

Research Usage

Reproducing Results

Setup Environment: Follow Quick Start guide
Run Mini Evaluation: Verify setup with python test_mini_run.py
Full Evaluation: Run complete benchmark with research endpoint
Analyze Results: Check generated JSON files in output directory

Custom Evaluations

# Custom task evaluation
python -m src.swiftsolve.evaluation.batch_runner \
  --custom-tasks datasets/my_tasks/ \
  --seeds 42 123 456 789 \
  --output-dir custom_results

Baseline Comparisons

# Run baseline evaluation
curl -X POST "http://localhost:8000/baseline/evaluate" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4", "max_workers": 4}'

Citation

If you use SwiftSolve in your research, please cite:

@article{swiftsolve2025,
  title={SwiftSolve: Multi-Agent Code Generation with Efficiency Optimization},
  author={Your Name and Collaborators},
  journal={NeurIPS},
  year={2025}
}

Contributing

We welcome contributions! Please see our contributing guidelines and submit pull requests for any improvements.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For questions, issues, or support:

Check the troubleshooting section above
Review existing issues on GitHub
Create a new issue with detailed information about your setup and problem

Note: This framework requires API access to OpenAI GPT-4 and Anthropic Claude. Ensure you have valid API keys and sufficient credits before running evaluations.

FilesExpand file tree

README.md

Latest commit

History