Skip to content

Latest commit

 

History

History
457 lines (357 loc) · 14.1 KB

File metadata and controls

457 lines (357 loc) · 14.1 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Overview

ShinkaEvolve is a framework combining Large Language Models (LLMs) with evolutionary algorithms for automated scientific code discovery. It maintains populations of programs that evolve over generations, using LLMs as intelligent mutation operators. The system supports parallel evaluation locally or on Slurm clusters and maintains archives of successful solutions for knowledge transfer between evolutionary islands.

Development Commands

Installation

# Using uv (recommended - faster)
uv venv --python 3.11
source .venv/bin/activate
uv pip install -e .

# Using conda/pip
conda create -n shinka python=3.11
conda activate shinka
pip install -e .

Running Evolution Experiments

# Launch with pre-configured variant
shinka_launch variant=circle_packing_example

# Launch with custom parameters
shinka_launch \
    task=circle_packing \
    database=island_large \
    evolution=small_budget \
    cluster=local \
    evo_config.num_generations=20

Visualization

# Launch interactive WebUI for monitoring evolution
shinka_visualize --port 8888 --open

Testing

# Run tests
pytest tests/

Architecture

Core Components

Runner (shinka/core/runner.py)

  • EvolutionRunner: Main orchestrator for evolutionary experiments
  • Manages LLM interactions, job scheduling, and evolution loop
  • Handles patch generation, application, and evaluation queue

Database (shinka/database/)

  • ProgramDatabase: SQLite-backed storage for program population
  • islands.py: Island-based evolution topology for diversity
  • parents.py: Parent selection strategies (power-law, weighted, beam search)
  • inspirations.py: Archive-based inspiration sampling
  • complexity.py: Code complexity metrics

Launch (shinka/launch/)

  • JobScheduler: Abstract interface for job execution
  • local.py: Local execution with optional conda environment
  • slurm.py: Slurm cluster execution with Docker/Conda support

Edit (shinka/edit/)

  • apply_diff.py: Apply diff-based code patches
  • apply_full.py: Full code replacement patches
  • async_apply.py: Asynchronous patch application with LLM retry logic

LLM (shinka/llm/)

  • LLMClient: Unified interface for OpenAI, Anthropic, Azure models
  • EmbeddingClient: Code embedding generation for similarity
  • Dynamic model selection with bandit algorithms

Core Utilities (shinka/core/)

  • sampler.py: Prompt sampling and patch generation
  • summarizer.py: Meta-recommendations across generations
  • novelty_judge.py: Novelty assessment for open-ended exploration
  • wrap_eval.py: run_shinka_eval() evaluation wrapper

Configuration System

Uses Hydra for composable, hierarchical configs:

configs/
├── config.yaml           # Main config with defaults
├── cluster/              # Execution environments (local, gcp, remote)
├── database/             # Evolution settings (island_small, island_medium, island_large)
├── evolution/            # Computational budgets (small, medium, large)
├── task/                 # Problem definitions (circle_packing, agent_design, etc.)
└── variant/              # Pre-configured combinations

Override parameters via CLI:

shinka_launch task=my_task evo_config.num_generations=50 db_config.num_islands=8

Evaluation Architecture

Evolution experiments require two key files:

initial.py - Starting solution with evolution markers:

# EVOLVE-BLOCK-START
def advanced_algo():
    # This section will be evolved by LLMs
    return solution
# EVOLVE-BLOCK-END

def run_experiment(**kwargs):
    """Entry point called by evaluator"""
    result = advanced_algo()
    return result

evaluate.py - Evaluation script using run_shinka_eval():

from shinka.core import run_shinka_eval

def main(program_path: str, results_dir: str):
    metrics, correct, err = run_shinka_eval(
        program_path=program_path,
        results_dir=results_dir,
        experiment_fn_name="run_experiment",
        num_runs=3,
        get_experiment_kwargs=get_kwargs_fn,
        validate_fn=validate_fn,
        aggregate_metrics_fn=aggregate_fn,
    )

def validate_fn(run_output) -> Tuple[bool, Optional[str]]:
    # Returns (is_valid, error_message)
    if constraint_violated:
        return False, "Error description"
    return True, None

def aggregate_fn(results: list) -> dict:
    return {
        "combined_score": float(score),  # PRIMARY FITNESS (higher=better)
        "public": {...},     # Visible in WebUI/logs
        "private": {...},    # Internal analysis only
        "extra_data": {...}, # Stored as pickle
        "text_feedback": "", # Optional text feedback for LLM
    }

Island Evolution Model

  • Multiple independent islands evolve populations in parallel
  • Periodic migration exchanges solutions between islands
  • Each island maintains its own archive of elite solutions
  • Parent selection strategies: power-law, weighted, beam search
  • Exploitation vs exploration controlled by exploitation_ratio and exploitation_alpha

Patch Types

  • diff: Generate and apply unified diffs to evolve code sections
  • full: Replace entire EVOLVE-BLOCK with new implementation
  • cross: Crossover between multiple parent solutions (future work)

Patch type probabilities configurable via patch_type_probs.

Key Configuration Parameters

Evolution Control:

  • num_generations: Number of evolution cycles
  • max_parallel_jobs: Concurrent evaluation jobs
  • max_patch_attempts: Max LLM retries for valid patch
  • llm_models: List of LLM models for mutations
  • patch_types / patch_type_probs: Mutation operator distribution

Database/Islands:

  • num_islands: Number of independent evolution islands
  • archive_size: Elite solution archive per island
  • migration_interval: Generations between island migration
  • exploitation_ratio: Balance exploitation vs exploration
  • parent_selection_strategy: "power_law", "weighted", "beam_search"

Novelty (for open-ended tasks):

  • code_embed_sim_threshold: Embedding similarity threshold
  • max_novelty_attempts: Max attempts for novel solutions
  • novelty_llm_models: LLMs for novelty judgment

Examples Directory

Each example contains:

  • initial.py: Starting solution with EVOLVE-BLOCK markers
  • evaluate.py: Evaluation script
  • run_evo.py: Direct Python API runner (optional)

Available Examples:

  • circle_packing: Optimize 26 circles in unit square
  • adas_aime: Agent scaffold design for math tasks
  • ale_bench: Code optimization for ALE-Bench tasks
  • novelty_generator: Creative output generation (ASCII art, etc.)

Python API Usage

from shinka.core import EvolutionRunner, EvolutionConfig
from shinka.database import DatabaseConfig
from shinka.launch import LocalJobConfig

job_config = LocalJobConfig(
    eval_program_path="examples/my_task/evaluate.py",
    conda_env="my_env",  # Optional: use specific conda environment
)

db_config = DatabaseConfig(
    num_islands=4,
    archive_size=100,
    migration_interval=10,
)

evo_config = EvolutionConfig(
    num_generations=20,
    max_parallel_jobs=2,
    llm_models=["azure-gpt-4.1"],
    init_program_path="examples/my_task/initial.py",
    task_sys_msg="You are optimizing [specific task]...",
)

runner = EvolutionRunner(
    evo_config=evo_config,
    job_config=job_config,
    db_config=db_config,
)
runner.run()

Project Structure Notes

  • Python 3.10+ required (3.11 recommended)
  • All package management via pyproject.toml
  • Uses uv for fast dependency management (optional but recommended)
  • Logging via rich for terminal output
  • Database: SQLite for program storage
  • LLM providers: Anthropic, Azure OpenAI (DeepSeek, Gemini also supported)
  • Credentials via .env file (see Azure OpenAI Configuration below)

Azure OpenAI Configuration

IMPORTANT: This installation uses Azure OpenAI exclusively for all OpenAI models. Direct OpenAI API calls have been disabled. All model requests route through Azure OpenAI.

Authentication Methods

Method 1: OAuth2 with Azure AD (Recommended for Production)

Configure service principal authentication in your .env file:

# ============================================
# AZURE AD OAUTH2 CREDENTIALS
# ============================================
AZURE_TENANT_ID=12345678-1234-1234-1234-123456789abc
AZURE_CLIENT_ID=87654321-4321-4321-4321-cba987654321
AZURE_CLIENT_SECRET=your-client-secret-value-here
AZURE_SCOPE=https://cognitiveservices.azure.com/.default

# ============================================
# AZURE OPENAI ENDPOINT CONFIGURATION
# ============================================
AZURE_API_VERSION=2024-02-01
AZURE_API_ENDPOINT=https://your-resource.openai.azure.com/

# ============================================
# MODEL DEPLOYMENT MAPPINGS (Optional)
# ============================================
# Map OpenAI model names to your Azure deployment names
AZURE_MODEL_DEPLOYMENTS={"gpt-4.1": "prod-gpt4-v2", "gpt-4.1-mini": "prod-gpt4mini", "gpt-4.1-nano": "prod-gpt4nano", "o3-mini": "reasoning-mini-prod", "gpt-4o": "gpt4o-production"}

# Map embedding models to Azure deployments
AZURE_EMBEDDING_DEPLOYMENTS={"text-embedding-3-small": "embedding-small-prod", "text-embedding-3-large": "embedding-large-prod"}

# ============================================
# CUSTOM HTTP HEADERS (Optional)
# ============================================
# Add custom headers for API management, routing, etc.
AZURE_CUSTOM_HEADERS={"X-API-Key": "your-value", "X-Custom-Header": "another-value"}

OAuth2 Authentication Flow:

  • Uses Azure AD client credentials flow (service principal)
  • Tokens are automatically acquired and refreshed
  • Requires service principal with "Cognitive Services OpenAI User" role
  • Most secure method for production environments

Method 2: API Key Authentication (Development/Simple Setup)

If you don't configure OAuth2 credentials, the system automatically falls back to API key authentication:

# ============================================
# AZURE OPENAI API KEY (Fallback)
# ============================================
AZURE_OPENAI_API_KEY=your-azure-openai-api-key
AZURE_API_VERSION=2024-02-01
AZURE_API_ENDPOINT=https://your-resource.openai.azure.com/

# Optional: Model deployment mappings (same as OAuth2)
AZURE_MODEL_DEPLOYMENTS={"gpt-4.1": "my-gpt4-deployment"}
AZURE_EMBEDDING_DEPLOYMENTS={"text-embedding-3-small": "my-embedding"}

Model Deployment Mapping

Azure OpenAI uses deployment names that may differ from OpenAI model names. Configure the mapping to use standard model names in your code:

Without Mapping (Default):

# If not configured, model name = deployment name
# You must create deployments with exact OpenAI model names

With Custom Mapping (Recommended):

# Map OpenAI model names to your custom Azure deployment names
AZURE_MODEL_DEPLOYMENTS={
  "gpt-4.1": "production-gpt4-v2",
  "gpt-4.1-mini": "staging-gpt4mini",
  "o3-mini": "reasoning-o3-prod"
}

AZURE_EMBEDDING_DEPLOYMENTS={
  "text-embedding-3-small": "embed-small-v1",
  "text-embedding-3-large": "embed-large-v1"
}

Then use standard OpenAI model names in your code:

evo_config = EvolutionConfig(
    llm_models=["gpt-4.1", "gpt-4.1-mini"],  # Maps to Azure deployments automatically
    embedding_model="text-embedding-3-small",  # Maps to "embed-small-v1"
)

Usage Examples

All OpenAI model names automatically route to Azure:

from shinka.core import EvolutionRunner, EvolutionConfig
from shinka.database import DatabaseConfig
from shinka.launch import LocalJobConfig

# Standard OpenAI model names - all use Azure OpenAI
evo_config = EvolutionConfig(
    num_generations=20,
    llm_models=["gpt-4.1", "gpt-4.1-mini", "o3-mini"],  # Azure only
    embedding_model="text-embedding-3-small",  # Azure only
)

job_config = LocalJobConfig(
    eval_program_path="examples/my_task/evaluate.py",
)

runner = EvolutionRunner(
    evo_config=evo_config,
    job_config=job_config,
    db_config=DatabaseConfig(),
)
runner.run()

Backward compatible with azure- prefix:

# These still work (azure- prefix is stripped automatically)
evo_config = EvolutionConfig(
    llm_models=["azure-gpt-4.1", "azure-gpt-4.1-mini"],  # Same as without prefix
)

Custom HTTP Headers

For API management, routing, or compliance requirements:

AZURE_CUSTOM_HEADERS={
  "Ocp-Apim-Subscription-Key": "your-apim-key",
  "X-Routing-Preference": "regional",
  "X-Compliance-Tag": "production"
}

Headers are automatically included in all Azure OpenAI requests.

Troubleshooting

OAuth2 Authentication Issues:

# Verify service principal has correct role assignment
az role assignment list --assignee <client-id> --scope /subscriptions/<subscription-id>/resourceGroups/<rg>/providers/Microsoft.CognitiveServices/accounts/<openai-resource>

# Check credentials are correct
echo $AZURE_TENANT_ID
echo $AZURE_CLIENT_ID
# Don't echo secret in production!

# Verify scope is correct (must end with /.default)
echo $AZURE_SCOPE
# Should output: https://cognitiveservices.azure.com/.default

Deployment Mapping Issues:

# Validate JSON syntax
python -c "import json; print(json.loads('$AZURE_MODEL_DEPLOYMENTS'))"

# List your actual Azure deployments
az cognitiveservices account deployment list \
  --name <openai-resource> \
  --resource-group <rg> \
  --query "[].name"

Fallback to API Key:

  • System automatically uses API key if OAuth2 vars not configured
  • Check logs for: "OAuth2 authentication unavailable"
  • Verify AZURE_OPENAI_API_KEY is set correctly

Model Not Found Errors:

  • Verify deployment exists in Azure OpenAI Studio
  • Check deployment name matches mapping or model name
  • Ensure deployment is in "Succeeded" state

Other LLM Providers

The following providers are unchanged and work as before:

# Anthropic (Claude)
ANTHROPIC_API_KEY=your-anthropic-key

# DeepSeek
DEEPSEEK_API_KEY=your-deepseek-key

# Google Gemini
GEMINI_API_KEY=your-gemini-key

# AWS Bedrock (Claude via Bedrock)
AWS_ACCESS_KEY_ID=your-aws-key
AWS_SECRET_ACCESS_KEY=your-aws-secret
AWS_REGION_NAME=us-east-1