CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Overview

ShinkaEvolve is a framework combining Large Language Models (LLMs) with evolutionary algorithms for automated scientific code discovery. It maintains populations of programs that evolve over generations, using LLMs as intelligent mutation operators. The system supports parallel evaluation locally or on Slurm clusters and maintains archives of successful solutions for knowledge transfer between evolutionary islands.

Development Commands

Installation

# Using uv (recommended - faster)
uv venv --python 3.11
source .venv/bin/activate
uv pip install -e .

# Using conda/pip
conda create -n shinka python=3.11
conda activate shinka
pip install -e .

Running Evolution Experiments

# Launch with pre-configured variant
shinka_launch variant=circle_packing_example

# Launch with custom parameters
shinka_launch \
    task=circle_packing \
    database=island_large \
    evolution=small_budget \
    cluster=local \
    evo_config.num_generations=20

Visualization

# Launch interactive WebUI for monitoring evolution
shinka_visualize --port 8888 --open

Testing

# Run tests
pytest tests/

Architecture

Core Components

Runner (shinka/core/runner.py)

EvolutionRunner: Main orchestrator for evolutionary experiments
Manages LLM interactions, job scheduling, and evolution loop
Handles patch generation, application, and evaluation queue

Database (shinka/database/)

ProgramDatabase: SQLite-backed storage for program population
islands.py: Island-based evolution topology for diversity
parents.py: Parent selection strategies (power-law, weighted, beam search)
inspirations.py: Archive-based inspiration sampling
complexity.py: Code complexity metrics

Launch (shinka/launch/)

JobScheduler: Abstract interface for job execution
local.py: Local execution with optional conda environment
slurm.py: Slurm cluster execution with Docker/Conda support

Edit (shinka/edit/)

apply_diff.py: Apply diff-based code patches
apply_full.py: Full code replacement patches
async_apply.py: Asynchronous patch application with LLM retry logic

LLM (shinka/llm/)

LLMClient: Unified interface for OpenAI, Anthropic, Azure models
EmbeddingClient: Code embedding generation for similarity
Dynamic model selection with bandit algorithms

Core Utilities (shinka/core/)

sampler.py: Prompt sampling and patch generation
summarizer.py: Meta-recommendations across generations
novelty_judge.py: Novelty assessment for open-ended exploration
wrap_eval.py: run_shinka_eval() evaluation wrapper

Configuration System

Uses Hydra for composable, hierarchical configs:

configs/
├── config.yaml           # Main config with defaults
├── cluster/              # Execution environments (local, gcp, remote)
├── database/             # Evolution settings (island_small, island_medium, island_large)
├── evolution/            # Computational budgets (small, medium, large)
├── task/                 # Problem definitions (circle_packing, agent_design, etc.)
└── variant/              # Pre-configured combinations

Override parameters via CLI:

shinka_launch task=my_task evo_config.num_generations=50 db_config.num_islands=8

Evaluation Architecture

Evolution experiments require two key files:

initial.py - Starting solution with evolution markers:

# EVOLVE-BLOCK-START
def advanced_algo():
    # This section will be evolved by LLMs
    return solution
# EVOLVE-BLOCK-END

def run_experiment(**kwargs):
    """Entry point called by evaluator"""
    result = advanced_algo()
    return result

evaluate.py - Evaluation script using run_shinka_eval():

from shinka.core import run_shinka_eval

def main(program_path: str, results_dir: str):
    metrics, correct, err = run_shinka_eval(
        program_path=program_path,
        results_dir=results_dir,
        experiment_fn_name="run_experiment",
        num_runs=3,
        get_experiment_kwargs=get_kwargs_fn,
        validate_fn=validate_fn,
        aggregate_metrics_fn=aggregate_fn,
    )

def validate_fn(run_output) -> Tuple[bool, Optional[str]]:
    # Returns (is_valid, error_message)
    if constraint_violated:
        return False, "Error description"
    return True, None

def aggregate_fn(results: list) -> dict:
    return {
        "combined_score": float(score),  # PRIMARY FITNESS (higher=better)
        "public": {...},     # Visible in WebUI/logs
        "private": {...},    # Internal analysis only
        "extra_data": {...}, # Stored as pickle
        "text_feedback": "", # Optional text feedback for LLM
    }

Island Evolution Model

Multiple independent islands evolve populations in parallel
Periodic migration exchanges solutions between islands
Each island maintains its own archive of elite solutions
Parent selection strategies: power-law, weighted, beam search
Exploitation vs exploration controlled by exploitation_ratio and exploitation_alpha

Patch Types

diff: Generate and apply unified diffs to evolve code sections
full: Replace entire EVOLVE-BLOCK with new implementation
cross: Crossover between multiple parent solutions (future work)

Patch type probabilities configurable via patch_type_probs.

Key Configuration Parameters

Evolution Control:

num_generations: Number of evolution cycles
max_parallel_jobs: Concurrent evaluation jobs
max_patch_attempts: Max LLM retries for valid patch
llm_models: List of LLM models for mutations
patch_types / patch_type_probs: Mutation operator distribution

Database/Islands:

num_islands: Number of independent evolution islands
archive_size: Elite solution archive per island
migration_interval: Generations between island migration
exploitation_ratio: Balance exploitation vs exploration
parent_selection_strategy: "power_law", "weighted", "beam_search"

Novelty (for open-ended tasks):

code_embed_sim_threshold: Embedding similarity threshold
max_novelty_attempts: Max attempts for novel solutions
novelty_llm_models: LLMs for novelty judgment

Examples Directory

Each example contains:

initial.py: Starting solution with EVOLVE-BLOCK markers
evaluate.py: Evaluation script
run_evo.py: Direct Python API runner (optional)

Available Examples:

circle_packing: Optimize 26 circles in unit square
adas_aime: Agent scaffold design for math tasks
ale_bench: Code optimization for ALE-Bench tasks
novelty_generator: Creative output generation (ASCII art, etc.)

Python API Usage

from shinka.core import EvolutionRunner, EvolutionConfig
from shinka.database import DatabaseConfig
from shinka.launch import LocalJobConfig

job_config = LocalJobConfig(
    eval_program_path="examples/my_task/evaluate.py",
    conda_env="my_env",  # Optional: use specific conda environment
)

db_config = DatabaseConfig(
    num_islands=4,
    archive_size=100,
    migration_interval=10,
)

evo_config = EvolutionConfig(
    num_generations=20,
    max_parallel_jobs=2,
    llm_models=["azure-gpt-4.1"],
    init_program_path="examples/my_task/initial.py",
    task_sys_msg="You are optimizing [specific task]...",
)

runner = EvolutionRunner(
    evo_config=evo_config,
    job_config=job_config,
    db_config=db_config,
)
runner.run()

Project Structure Notes

Python 3.10+ required (3.11 recommended)
All package management via pyproject.toml
Uses uv for fast dependency management (optional but recommended)
Logging via rich for terminal output
Database: SQLite for program storage
LLM providers: Anthropic, Azure OpenAI (DeepSeek, Gemini also supported)
Credentials via .env file (see Azure OpenAI Configuration below)

Azure OpenAI Configuration

IMPORTANT: This installation uses Azure OpenAI exclusively for all OpenAI models. Direct OpenAI API calls have been disabled. All model requests route through Azure OpenAI.

Authentication Methods

Method 1: OAuth2 with Azure AD (Recommended for Production)

Configure service principal authentication in your .env file:

# ============================================
# AZURE AD OAUTH2 CREDENTIALS
# ============================================
AZURE_TENANT_ID=12345678-1234-1234-1234-123456789abc
AZURE_CLIENT_ID=87654321-4321-4321-4321-cba987654321
AZURE_CLIENT_SECRET=your-client-secret-value-here
AZURE_SCOPE=https://cognitiveservices.azure.com/.default

# ============================================
# AZURE OPENAI ENDPOINT CONFIGURATION
# ============================================
AZURE_API_VERSION=2024-02-01
AZURE_API_ENDPOINT=https://your-resource.openai.azure.com/

# ============================================
# MODEL DEPLOYMENT MAPPINGS (Optional)
# ============================================
# Map OpenAI model names to your Azure deployment names
AZURE_MODEL_DEPLOYMENTS={"gpt-4.1": "prod-gpt4-v2", "gpt-4.1-mini": "prod-gpt4mini", "gpt-4.1-nano": "prod-gpt4nano", "o3-mini": "reasoning-mini-prod", "gpt-4o": "gpt4o-production"}

# Map embedding models to Azure deployments
AZURE_EMBEDDING_DEPLOYMENTS={"text-embedding-3-small": "embedding-small-prod", "text-embedding-3-large": "embedding-large-prod"}

# ============================================
# CUSTOM HTTP HEADERS (Optional)
# ============================================
# Add custom headers for API management, routing, etc.
AZURE_CUSTOM_HEADERS={"X-API-Key": "your-value", "X-Custom-Header": "another-value"}

OAuth2 Authentication Flow:

Uses Azure AD client credentials flow (service principal)
Tokens are automatically acquired and refreshed
Requires service principal with "Cognitive Services OpenAI User" role
Most secure method for production environments

Method 2: API Key Authentication (Development/Simple Setup)

If you don't configure OAuth2 credentials, the system automatically falls back to API key authentication:

# ============================================
# AZURE OPENAI API KEY (Fallback)
# ============================================
AZURE_OPENAI_API_KEY=your-azure-openai-api-key
AZURE_API_VERSION=2024-02-01
AZURE_API_ENDPOINT=https://your-resource.openai.azure.com/

# Optional: Model deployment mappings (same as OAuth2)
AZURE_MODEL_DEPLOYMENTS={"gpt-4.1": "my-gpt4-deployment"}
AZURE_EMBEDDING_DEPLOYMENTS={"text-embedding-3-small": "my-embedding"}

Model Deployment Mapping

Azure OpenAI uses deployment names that may differ from OpenAI model names. Configure the mapping to use standard model names in your code:

Without Mapping (Default):

# If not configured, model name = deployment name
# You must create deployments with exact OpenAI model names

With Custom Mapping (Recommended):

# Map OpenAI model names to your custom Azure deployment names
AZURE_MODEL_DEPLOYMENTS={
  "gpt-4.1": "production-gpt4-v2",
  "gpt-4.1-mini": "staging-gpt4mini",
  "o3-mini": "reasoning-o3-prod"
}

AZURE_EMBEDDING_DEPLOYMENTS={
  "text-embedding-3-small": "embed-small-v1",
  "text-embedding-3-large": "embed-large-v1"
}

Then use standard OpenAI model names in your code:

evo_config = EvolutionConfig(
    llm_models=["gpt-4.1", "gpt-4.1-mini"],  # Maps to Azure deployments automatically
    embedding_model="text-embedding-3-small",  # Maps to "embed-small-v1"
)

Usage Examples

All OpenAI model names automatically route to Azure:

from shinka.core import EvolutionRunner, EvolutionConfig
from shinka.database import DatabaseConfig
from shinka.launch import LocalJobConfig

# Standard OpenAI model names - all use Azure OpenAI
evo_config = EvolutionConfig(
    num_generations=20,
    llm_models=["gpt-4.1", "gpt-4.1-mini", "o3-mini"],  # Azure only
    embedding_model="text-embedding-3-small",  # Azure only
)

job_config = LocalJobConfig(
    eval_program_path="examples/my_task/evaluate.py",
)

runner = EvolutionRunner(
    evo_config=evo_config,
    job_config=job_config,
    db_config=DatabaseConfig(),
)
runner.run()

Backward compatible with azure- prefix:

# These still work (azure- prefix is stripped automatically)
evo_config = EvolutionConfig(
    llm_models=["azure-gpt-4.1", "azure-gpt-4.1-mini"],  # Same as without prefix
)

Custom HTTP Headers

For API management, routing, or compliance requirements:

AZURE_CUSTOM_HEADERS={
  "Ocp-Apim-Subscription-Key": "your-apim-key",
  "X-Routing-Preference": "regional",
  "X-Compliance-Tag": "production"
}

Headers are automatically included in all Azure OpenAI requests.

Troubleshooting

OAuth2 Authentication Issues:

# Verify service principal has correct role assignment
az role assignment list --assignee <client-id> --scope /subscriptions/<subscription-id>/resourceGroups/<rg>/providers/Microsoft.CognitiveServices/accounts/<openai-resource>

# Check credentials are correct
echo $AZURE_TENANT_ID
echo $AZURE_CLIENT_ID
# Don't echo secret in production!

# Verify scope is correct (must end with /.default)
echo $AZURE_SCOPE
# Should output: https://cognitiveservices.azure.com/.default

Deployment Mapping Issues:

# Validate JSON syntax
python -c "import json; print(json.loads('$AZURE_MODEL_DEPLOYMENTS'))"

# List your actual Azure deployments
az cognitiveservices account deployment list \
  --name <openai-resource> \
  --resource-group <rg> \
  --query "[].name"

Fallback to API Key:

System automatically uses API key if OAuth2 vars not configured
Check logs for: "OAuth2 authentication unavailable"
Verify AZURE_OPENAI_API_KEY is set correctly

Model Not Found Errors:

Verify deployment exists in Azure OpenAI Studio
Check deployment name matches mapping or model name
Ensure deployment is in "Succeeded" state

Other LLM Providers

The following providers are unchanged and work as before:

# Anthropic (Claude)
ANTHROPIC_API_KEY=your-anthropic-key

# DeepSeek
DEEPSEEK_API_KEY=your-deepseek-key

# Google Gemini
GEMINI_API_KEY=your-gemini-key

# AWS Bedrock (Claude via Bedrock)
AWS_ACCESS_KEY_ID=your-aws-key
AWS_SECRET_ACCESS_KEY=your-aws-secret
AWS_REGION_NAME=us-east-1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Overview

Development Commands

Installation

Running Evolution Experiments

Visualization

Testing

Architecture

Core Components

Configuration System

Evaluation Architecture

Island Evolution Model

Patch Types

Key Configuration Parameters

Examples Directory

Python API Usage

Project Structure Notes

Azure OpenAI Configuration

Authentication Methods

Method 1: OAuth2 with Azure AD (Recommended for Production)

Method 2: API Key Authentication (Development/Simple Setup)

Model Deployment Mapping

Usage Examples

Custom HTTP Headers

Troubleshooting

Other LLM Providers

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Overview

Development Commands

Installation

Running Evolution Experiments

Visualization

Testing

Architecture

Core Components

Configuration System

Evaluation Architecture

Island Evolution Model

Patch Types

Key Configuration Parameters

Examples Directory

Python API Usage

Project Structure Notes

Azure OpenAI Configuration

Authentication Methods

Method 1: OAuth2 with Azure AD (Recommended for Production)

Method 2: API Key Authentication (Development/Simple Setup)

Model Deployment Mapping

Usage Examples

Custom HTTP Headers

Troubleshooting

Other LLM Providers