FORYOU.md - Step-by-Step Learning Guide for Shinka

This guide will take you from zero to hero with the ShinkaEvolve framework, starting with simple Azure OpenAI integration tests, progressing through examples, and ending with creating your own evolution experiments.

Prerequisites
Phase 1: Environment Setup & Azure OpenAI Verification
Phase 2: Running Example 1 - Circle Packing
Phase 3: Running Example 2 - Novelty Generator
Phase 4: Running Example 3 - Agent Design (ADAS AIME)
Phase 5: Creating Your Own Evolution Experiment
SOP: Standard Operating Procedure for New Use Cases
Troubleshooting

Prerequisites

Required Software

Python 3.11+ installed
uv package manager (recommended) or pip
Git
Azure OpenAI resource with deployments configured

Required Credentials

Azure AD service principal (tenant ID, client ID, client secret) OR
Azure OpenAI API key
Azure OpenAI endpoint URL

Phase 1: Environment Setup & Azure OpenAI Verification

Step 1.1: Install Dependencies

# Navigate to project directory
cd /Users/samcc/Documents/CodexProject/ShinkaEvolve

# Install using uv (recommended)
uv venv --python 3.11
source .venv/bin/activate
uv pip install -e .

# OR using pip
pip install -e .

Step 1.2: Configure Azure OpenAI Credentials

Edit the .env file in the project root:

# Minimum required configuration (choose ONE authentication method)

# Option 1: OAuth2 (Recommended for Production)
AZURE_TENANT_ID=your-tenant-id-here
AZURE_CLIENT_ID=your-client-id-here
AZURE_CLIENT_SECRET=your-client-secret-here
AZURE_API_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_API_VERSION=2024-02-01

# Option 2: API Key (Simple/Development)
AZURE_OPENAI_API_KEY=your-api-key-here
AZURE_API_ENDPOINT=https://your-resource.openai.azure.com/
AZURE_API_VERSION=2024-02-01

# Model Deployment Mappings (IMPORTANT!)
# Map OpenAI model names to your Azure deployment names
AZURE_MODEL_DEPLOYMENTS={"gpt-4.1-mini": "your-gpt4-mini-deployment-name"}
AZURE_EMBEDDING_DEPLOYMENTS={"text-embedding-3-small": "your-embedding-deployment-name"}

Finding Your Deployment Names:

Go to Azure Portal → Azure OpenAI Studio
Click Deployments in left sidebar
Copy the Deployment name column values
Paste into the JSON mapping above

Step 1.3: Test Azure OpenAI Integration (Simple Test)

Create a test file test_azure.py:

#!/usr/bin/env python3
"""Simple test to verify Azure OpenAI integration works."""

from shinka.llm import LLMClient

def test_llm_client():
    """Test basic LLM client functionality."""
    print("🧪 Testing Azure OpenAI LLM Client...")

    # Create client with your model
    client = LLMClient(
        model_names=["gpt-4.1-mini"],  # Will map to your Azure deployment
        temperatures=0.7,
        max_tokens=100,
    )

    # Test query
    result = client.query(
        msg="Say 'Hello from Azure OpenAI!' and nothing else.",
        system_msg="You are a helpful assistant.",
    )

    if result and result.content:
        print(f"✅ SUCCESS! Response: {result.content}")
        print(f"💰 Cost: ${result.cost:.4f}")
        print(f"📊 Tokens: {result.input_tokens} in, {result.output_tokens} out")
        return True
    else:
        print("❌ FAILED! No response received.")
        return False

def test_embedding_client():
    """Test embedding client functionality."""
    print("\n🧪 Testing Azure OpenAI Embedding Client...")

    from shinka.llm import EmbeddingClient

    client = EmbeddingClient(
        model_name="text-embedding-3-small",  # Will map to your deployment
    )

    # Test embedding
    embedding, cost = client.get_embedding("Hello, Azure OpenAI!")

    if embedding and len(embedding) > 0:
        print(f"✅ SUCCESS! Embedding dimension: {len(embedding)}")
        print(f"💰 Cost: ${cost:.6f}")
        return True
    else:
        print("❌ FAILED! No embedding received.")
        return False

if __name__ == "__main__":
    print("=" * 60)
    print("AZURE OPENAI INTEGRATION TEST")
    print("=" * 60)

    llm_success = test_llm_client()
    embed_success = test_embedding_client()

    print("\n" + "=" * 60)
    if llm_success and embed_success:
        print("🎉 ALL TESTS PASSED! Azure OpenAI is configured correctly.")
        print("You can now proceed to running examples.")
    else:
        print("⚠️  Some tests failed. Check your .env configuration.")
        print("See CLAUDE.md for detailed troubleshooting steps.")
    print("=" * 60)

Run the test:

python test_azure.py

Expected Output:

============================================================
AZURE OPENAI INTEGRATION TEST
============================================================
🧪 Testing Azure OpenAI LLM Client...
✅ SUCCESS! Response: Hello from Azure OpenAI!
💰 Cost: $0.0012
📊 Tokens: 45 in, 8 out

🧪 Testing Azure OpenAI Embedding Client...
✅ SUCCESS! Embedding dimension: 1536
💰 Cost: $0.000002

============================================================
🎉 ALL TESTS PASSED! Azure OpenAI is configured correctly.
You can now proceed to running examples.
============================================================

If tests fail: See Troubleshooting section.

Phase 2: Running Example 1 - Circle Packing

Objective: Evolve code to pack 26 circles in a unit square, maximizing the sum of their radii.

Difficulty: ⭐⭐ (Beginner)

What you'll learn:

Basic evolution loop
How EVOLVE-BLOCK markers work
Understanding evaluation scripts
Monitoring evolution progress
Using the WebUI for visualization

Step 2.1: Understand the Problem

The goal is to find the best arrangement of 26 circles in a 1x1 square such that:

All circles are inside the square
No circles overlap
The sum of radii is maximized (best known: ~2.635)

Step 2.2: Review the Initial Solution

Open examples/circle_packing/initial.py and look for:

# EVOLVE-BLOCK-START
def construct_packing():
    """Construct arrangement of 26 circles"""
    # This code will be evolved by LLMs
    # ...
# EVOLVE-BLOCK-END

Key Points:

Code between EVOLVE-BLOCK-START/END will be modified by LLMs
Code outside these markers stays unchanged
Initial solution is intentionally simple (LLMs will improve it)

Step 2.3: Review the Evaluation Script

Open examples/circle_packing/evaluate.py:

def adapted_validate_packing(run_output):
    """Check if circles are valid (no overlap, inside square)"""
    centers, radii, reported_sum = run_output
    # Returns (is_valid: bool, error_message: str or None)
    # ...

def aggregate_metrics(results, results_dir):
    """Compute fitness score"""
    return {
        "combined_score": float(reported_sum),  # Higher = better
        "public": {...},   # Visible in logs/WebUI
        "private": {...},  # Internal use only
    }

Key Points:

validate_fn: Ensures solutions are correct (no overlaps, in bounds)
aggregate_fn: Computes the fitness score (combined_score)
Higher combined_score = better solution (evolution maximizes this)

Step 2.4: Configure Evolution Parameters

Edit examples/circle_packing/run_evo.py:

# Minimal configuration for testing
db_config = DatabaseConfig(
    num_islands=2,        # Start with 2 islands
    archive_size=20,      # Keep top 20 solutions
)

evo_config = EvolutionConfig(
    num_generations=5,    # Start small (5 generations)
    max_parallel_jobs=1,  # 1 job at a time
    llm_models=["gpt-4.1-mini"],  # Your Azure deployment
    init_program_path="initial.py",
)

Step 2.5: Run Evolution

cd examples/circle_packing
python run_evo.py

Expected Output:

[SHINKA LOGO]
==> GENERATION 1/5
==> SAMPLING: gpt-4.1-mini
==> PATCH APPLIED: gen_1_patch_0
==> EVALUATING: gen_1_patch_0
==> RESULT: combined_score=1.234, valid=True
...
==> GENERATION 5/5 COMPLETE
Best solution: combined_score=2.145 (improved from 1.023!)

Step 2.6: Monitor with WebUI (Optional)

In another terminal:

cd /Users/samcc/Documents/CodexProject/ShinkaEvolve
shinka_visualize --port 8888 --open

This opens a browser showing:

Evolution progress in real-time
Genealogy tree (which solutions evolved from which)
Metrics over generations
Code diffs between generations

Step 2.7: Review Results

After evolution completes, check the results directory:

ls -lah results_*
# Contains:
# - evolution_db.sqlite (database with all solutions)
# - gen_1/, gen_2/, ... (code for each generation)
# - best_solution.py (best evolved code)

Open the best solution:

cat results_*/best_solution.py
# See how the LLM improved the code!

What to Look For:

Did the combined_score improve over generations?
What strategies did the LLM try? (check different generation folders)
How does the best solution differ from initial?

Phase 3: Running Example 2 - Novelty Generator

Objective: Generate creative, novel ASCII art or text patterns.

Difficulty: ⭐⭐⭐ (Intermediate)

What you'll learn:

Novelty-based evolution (not just optimization)
Using LLM judges for evaluation
Open-ended exploration
Text feedback in evolution loop

Step 3.1: Understand Novelty Search

Unlike circle packing (maximize score), novelty search aims to find diverse and creative solutions. Each solution is judged on:

Novelty: How different is it from previous solutions?
Quality: How interesting/surprising is it?

Step 3.2: Review the Initial Code

cat examples/novelty_generator/initial.py

Look for the EVOLVE-BLOCK that generates patterns.

Step 3.3: Configure Novelty Evolution

Create examples/novelty_generator/run_evo.py:

from shinka.core import EvolutionRunner, EvolutionConfig
from shinka.database import DatabaseConfig
from shinka.launch import LocalJobConfig

job_config = LocalJobConfig(eval_program_path="evaluate.py")

db_config = DatabaseConfig(
    num_islands=2,
    archive_size=30,  # Keep more diverse solutions
)

evo_config = EvolutionConfig(
    num_generations=10,
    max_parallel_jobs=1,
    llm_models=["gpt-4.1-mini"],
    init_program_path="initial.py",

    # Novelty-specific settings
    code_embed_sim_threshold=0.8,  # Similarity threshold
    max_novelty_attempts=3,         # Retries for novel solutions
    novelty_llm_models=["gpt-4.1-mini"],  # LLM judges
    use_text_feedback=True,         # Use text feedback in evolution
)

runner = EvolutionRunner(
    evo_config=evo_config,
    job_config=job_config,
    db_config=db_config,
)
runner.run()

Step 3.4: Run Novelty Evolution

cd examples/novelty_generator
python run_evo.py

Expected Behavior:

LLM generates diverse ASCII art patterns
Each generation tries something different
LLM judge evaluates creativity/novelty
Archive fills with diverse solutions (not just "best" one)

Step 3.5: Review Diverse Solutions

# Check different evolved patterns
cat results_*/gen_*/evolved_*.py

# See the archive of diverse solutions
sqlite3 results_*/evolution_db.sqlite "SELECT id, score, novelty_score FROM programs ORDER BY novelty_score DESC LIMIT 10;"

What to Notice:

Solutions are diverse (not converging to one pattern)
Novelty scores vary
Text feedback helps guide evolution toward interesting directions

Phase 4: Running Example 3 - Agent Design (ADAS AIME)

Objective: Evolve agent scaffolds to solve math competition problems (AIME dataset).

Difficulty: ⭐⭐⭐⭐ (Advanced)

What you'll learn:

Complex multi-step evaluation
Agent scaffolding evolution
Working with external datasets
Multi-metric optimization

Step 4.1: Understand the Task

AIME (American Invitational Mathematics Examination) problems are challenging math problems. The goal is to evolve an agent scaffold (prompt templates, reasoning strategies) that helps solve these problems.

Step 4.2: Review the Dataset

head examples/adas_aime/AIME_Dataset_1983_2025.csv

Each row contains:

Problem text
Correct answer
Year/difficulty

Step 4.3: Review Initial Agent Scaffold

cat examples/adas_aime/initial.py

Look for agent components:

Prompt templates
Reasoning strategies
Answer extraction methods

Step 4.4: Run Agent Evolution

cd examples/adas_aime
python run_evo.py

Note: This example requires more compute (solving math problems is expensive). Consider:

Reducing dataset size in evaluate.py
Using faster model: gpt-4.1-mini instead of gpt-4.1
Increasing max_parallel_jobs if you have quota

Step 4.5: Monitor Progress

This evolution takes longer. Watch for:

Accuracy improvements over generations
Different reasoning strategies tried by LLMs
Prompt engineering evolution (how agent asks itself questions)

# Check current best accuracy
tail -f results_*/evolution.log | grep "combined_score"

Step 4.6: Analyze Discovered Agents

ls examples/adas_aime/discovered/

# Review successful agent strategies
cat examples/adas_aime/discovered/1_gen15_cot_and_majority_voting_repackaged.py

What to Learn:

What strategies emerged? (Chain-of-thought? Self-reflection? Voting?)
How did prompts evolve over generations?
Which strategies generalize best to unseen problems?

Phase 5: Creating Your Own Evolution Experiment

Objective: Apply Shinka to your own optimization problem.

Difficulty: ⭐⭐⭐⭐⭐ (Expert)

Step 5.1: Define Your Problem

Ask yourself:

What am I optimizing? (e.g., algorithm speed, solution quality, creative output)
How do I measure success? (fitness function)
What constraints exist? (e.g., correctness requirements)
Is it optimization or novelty search?

Example Problems:

Optimize a sorting algorithm for specific data patterns
Generate creative product descriptions
Evolve hyperparameters for a ML model
Design API architectures
Generate test cases for edge cases

Step 5.2: Create Project Structure

mkdir -p examples/my_experiment
cd examples/my_experiment
touch initial.py evaluate.py run_evo.py

Step 5.3: Write Initial Solution (`initial.py`)

# EVOLVE-BLOCK-START
def my_algorithm(input_data):
    """
    Your starting algorithm implementation.
    This will be evolved by LLMs.

    Args:
        input_data: Your problem input

    Returns:
        result: Your problem output
    """
    # Simple/naive implementation here
    # LLMs will improve this!
    result = simple_solution(input_data)
    return result
# EVOLVE-BLOCK-END

def run_experiment(**kwargs):
    """
    Entry point called by evaluation script.

    Args:
        **kwargs: Parameters from get_experiment_kwargs

    Returns:
        Results that will be validated and scored
    """
    input_data = kwargs.get("input_data")
    result = my_algorithm(input_data)
    return result

# Helper functions (outside EVOLVE-BLOCK - won't be evolved)
def simple_solution(data):
    # Your baseline implementation
    return data

Key Rules:

Put code to evolve inside EVOLVE-BLOCK-START/END
Keep helper functions outside (they won't change)
Implement run_experiment(**kwargs) as entry point
Return results in format expected by validator

Step 5.4: Write Evaluation Script (`evaluate.py`)

#!/usr/bin/env python3
import argparse
from shinka.core import run_shinka_eval

def get_experiment_kwargs(run_idx: int) -> dict:
    """
    Generate kwargs for each evaluation run.

    Args:
        run_idx: Index of this run (0, 1, 2, ...)

    Returns:
        Dict of kwargs to pass to run_experiment
    """
    # Example: Load different test cases
    test_cases = [
        {"input_data": [1, 2, 3]},
        {"input_data": [5, 4, 3, 2, 1]},
        {"input_data": [10, 20, 30]},
    ]
    return test_cases[run_idx % len(test_cases)]

def validate_solution(run_output):
    """
    Check if solution is valid/correct.

    Args:
        run_output: Return value from run_experiment

    Returns:
        (is_valid: bool, error_msg: str or None)
    """
    result = run_output

    # Check correctness constraints
    if result is None:
        return False, "Result is None"

    # Add your validation logic
    if not meets_requirements(result):
        return False, "Does not meet requirements"

    return True, None

def aggregate_metrics(results: list, results_dir: str) -> dict:
    """
    Compute fitness score from all runs.

    Args:
        results: List of run_output from all runs
        results_dir: Directory to save extra data

    Returns:
        Dict with required keys:
        - combined_score: float (PRIMARY FITNESS - higher=better)
        - public: dict (visible in logs/WebUI)
        - private: dict (internal use)
        - text_feedback: str (optional, for LLM feedback)
    """
    # Compute your fitness metric
    scores = [compute_score(r) for r in results]
    avg_score = sum(scores) / len(scores)

    return {
        "combined_score": float(avg_score),  # MUST be present
        "public": {
            "average_score": avg_score,
            "num_runs": len(results),
        },
        "private": {
            "individual_scores": scores,
        },
        # Optional: guide LLM with text feedback
        "text_feedback": f"Score: {avg_score:.2f}. Try improving X.",
    }

def compute_score(result):
    """Your scoring logic"""
    return 1.0  # Replace with actual score

def meets_requirements(result):
    """Your validation logic"""
    return True  # Replace with actual checks

def main(program_path: str, results_dir: str):
    """Main evaluation function called by Shinka"""
    metrics, correct, error_msg = run_shinka_eval(
        program_path=program_path,
        results_dir=results_dir,
        experiment_fn_name="run_experiment",
        num_runs=3,  # Run 3 times and aggregate
        get_experiment_kwargs=get_experiment_kwargs,
        validate_fn=validate_solution,
        aggregate_metrics_fn=aggregate_metrics,
    )

    # Print results (optional, for debugging)
    if correct:
        print(f"✅ Valid solution: {metrics['combined_score']:.4f}")
    else:
        print(f"❌ Invalid solution: {error_msg}")

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--program_path", type=str, required=True)
    parser.add_argument("--results_dir", type=str, required=True)
    args = parser.parse_args()

    main(args.program_path, args.results_dir)

Step 5.5: Write Evolution Runner (`run_evo.py`)

#!/usr/bin/env python3
from shinka.core import EvolutionRunner, EvolutionConfig
from shinka.database import DatabaseConfig
from shinka.launch import LocalJobConfig

# Configure job execution
job_config = LocalJobConfig(
    eval_program_path="evaluate.py",
)

# Configure evolution database
db_config = DatabaseConfig(
    num_islands=4,
    archive_size=50,
    migration_interval=10,
    parent_selection_strategy="power_law",
    exploitation_alpha=1.0,
)

# Task-specific guidance for LLM
task_description = """You are optimizing [DESCRIBE YOUR PROBLEM].

Key insights:
1. [Important strategy 1]
2. [Important strategy 2]
3. [Known pitfalls to avoid]

Focus on [SPECIFIC GOALS].
"""

# Configure evolution parameters
evo_config = EvolutionConfig(
    num_generations=20,
    max_parallel_jobs=2,
    llm_models=["gpt-4.1-mini"],  # Or multiple models
    task_sys_msg=task_description,
    init_program_path="initial.py",
    patch_types=["diff", "full"],
    patch_type_probs=[0.7, 0.3],  # 70% diff, 30% full rewrites
)

# Run evolution
runner = EvolutionRunner(
    evo_config=evo_config,
    job_config=job_config,
    db_config=db_config,
)
runner.run()

Step 5.6: Test Your Setup

Before running full evolution, test each component:

# Test 1: Initial solution runs
python initial.py
# Should execute without errors

# Test 2: Evaluation script works
python evaluate.py --program_path initial.py --results_dir ./test_results
# Should output: ✅ Valid solution: X.XXXX

# Test 3: Quick evolution (1 generation)
# Edit run_evo.py: num_generations=1
python run_evo.py
# Should complete one evolution cycle

Step 5.7: Run Full Evolution

# Edit run_evo.py: num_generations=20 (or more)
python run_evo.py

# Monitor in another terminal
shinka_visualize --port 8888 --open

Step 5.8: Analyze Results

# Check evolution database
import sqlite3
conn = sqlite3.connect("results_*/evolution_db.sqlite")

# Get top solutions
top_solutions = conn.execute("""
    SELECT id, combined_score, generation, island_id
    FROM programs
    WHERE is_correct = 1
    ORDER BY combined_score DESC
    LIMIT 10
""").fetchall()

for sol_id, score, gen, island in top_solutions:
    print(f"Solution {sol_id}: score={score:.4f}, gen={gen}, island={island}")

SOP: Standard Operating Procedure for New Use Cases

Follow this checklist when starting a new evolution experiment:

☑️ Planning Phase

Define the problem clearly
- What are you optimizing?
- What's the input/output format?
- What are success criteria?
Choose evolution type
- Optimization (maximize/minimize score)
- Novelty search (explore diverse solutions)
- Multi-objective (balance multiple goals)
Design fitness function
- How do you score solutions? (higher = better)
- What makes a solution "valid"?
- Can you test automatically?

☑️ Implementation Phase

☑️ Configuration Phase

Tune evolution parameters
- Start small: num_generations=5, num_islands=2
- Choose appropriate model(s): balance cost vs. performance
- Write good task_sys_msg: guide LLM with domain knowledge
- Set patch_types: ["diff"] for incremental, ["full"] for rewrites
Configure Azure OpenAI
- Update .env with model deployment mappings
- Test with test_azure.py first
- Monitor costs: check combined_score vs. API costs

☑️ Execution Phase

Run small test
- num_generations=1-3
- Verify evolution loop works
- Check one solution improves
Scale up gradually
- num_generations=10-20
- Monitor WebUI for progress
- Check for convergence or diversity
Full evolution run
- num_generations=50-100 (depending on problem)
- Use max_parallel_jobs to speed up
- Save results to version control

☑️ Analysis Phase

Review best solutions
- Check results_*/best_solution.py
- Compare to initial: what strategies emerged?
- Test generalization on held-out data
Visualize evolution
- Use WebUI to see genealogy tree
- Plot combined_score over generations
- Identify innovation moments (big jumps)
Extract insights
- What patterns/strategies did LLMs discover?
- Which are human-interpretable?
- Which generalize to new problems?

☑️ Iteration Phase

Refine based on results
- Update task_sys_msg with discovered insights
- Adjust fitness function if needed
- Try different parent selection strategies
Experiment with variations
- Different LLM models
- Different island configurations
- Different patch type distributions

Troubleshooting

Azure OpenAI Connection Issues

Symptom: "Authentication failed" or "Unauthorized"

Solutions:

# Check OAuth2 credentials
python -c "import os; from dotenv import load_dotenv; load_dotenv(); print(f'Tenant: {os.getenv(\"AZURE_TENANT_ID\")}')"

# Verify service principal has role
az role assignment list --assignee $AZURE_CLIENT_ID

# Test API key fallback
export AZURE_OPENAI_API_KEY=your-key
python test_azure.py

Deployment Not Found

Symptom: "The API deployment for this resource does not exist"

Solutions:

# List your actual deployments
az cognitiveservices account deployment list \
  --name your-openai-resource \
  --resource-group your-rg

# Update .env mapping
AZURE_MODEL_DEPLOYMENTS={"gpt-4.1-mini": "actual-deployment-name"}

Evolution Not Improving

Symptom: combined_score stays flat across generations

Solutions:

Check task description: Is task_sys_msg clear and helpful?
Check fitness function: Does it differentiate good vs. bad solutions?
Check initial solution: Is it too good already (ceiling effect)?
Try different strategy: Change parent_selection_strategy
Increase diversity: Use more islands, larger archive

Evaluation Failures

Symptom: Most solutions marked is_correct=False

Solutions:

Check validation logic: Is it too strict?
Check EVOLVE-BLOCK scope: Are critical functions inside/outside?
Add guardrails: Give LLM clearer constraints in task_sys_msg
Review failed solutions: What mistakes are common?

High Costs

Symptom: Azure bill is higher than expected

Solutions:

Use smaller model: gpt-4.1-mini instead of gpt-4.1
Reduce generations: Start with 10-20, not 100
Reduce parallel jobs: Lower max_parallel_jobs
Monitor cost per generation: Check $ in logs
Set budget alerts: In Azure Portal

Next Steps

Now that you've completed all phases:

Explore advanced features:
- Meta-recommendations (evolution of evolution)
- Dynamic model selection
- Multi-objective optimization
Scale to clusters:
- Configure Slurm for large-scale experiments
- Use SlurmCondaJobConfig or SlurmDockerJobConfig
Contribute discoveries:
- Share successful strategies
- Report interesting emergent behaviors
- Submit PRs with new examples
Join the community:
- GitHub issues for questions
- Share your results

Happy Evolving! 🧬

For detailed reference, see:

CLAUDE.md - Technical reference
docs/getting_started.md - Installation guide
docs/configuration.md - Configuration options
docs/webui.md - WebUI guide

FilesExpand file tree

FORYOU.md

Latest commit

History

FORYOU.md

File metadata and controls

FORYOU.md - Step-by-Step Learning Guide for Shinka

Table of Contents

Prerequisites

Required Software

Required Credentials

Phase 1: Environment Setup & Azure OpenAI Verification

Step 1.1: Install Dependencies

Step 1.2: Configure Azure OpenAI Credentials

Step 1.3: Test Azure OpenAI Integration (Simple Test)

Phase 2: Running Example 1 - Circle Packing

Step 2.1: Understand the Problem

Step 2.2: Review the Initial Solution

Step 2.3: Review the Evaluation Script

Step 2.4: Configure Evolution Parameters

Step 2.5: Run Evolution

Step 2.6: Monitor with WebUI (Optional)

Step 2.7: Review Results

Phase 3: Running Example 2 - Novelty Generator

Step 3.1: Understand Novelty Search

Step 3.2: Review the Initial Code

Step 3.3: Configure Novelty Evolution

Step 3.4: Run Novelty Evolution

Step 3.5: Review Diverse Solutions

Phase 4: Running Example 3 - Agent Design (ADAS AIME)

Step 4.1: Understand the Task

Step 4.2: Review the Dataset

Step 4.3: Review Initial Agent Scaffold

Step 4.4: Run Agent Evolution

Step 4.5: Monitor Progress

Step 4.6: Analyze Discovered Agents

Phase 5: Creating Your Own Evolution Experiment

Step 5.1: Define Your Problem

Step 5.2: Create Project Structure

Step 5.3: Write Initial Solution (initial.py)

Step 5.4: Write Evaluation Script (evaluate.py)

Step 5.5: Write Evolution Runner (run_evo.py)

Step 5.6: Test Your Setup

Step 5.7: Run Full Evolution

Step 5.8: Analyze Results

SOP: Standard Operating Procedure for New Use Cases

☑️ Planning Phase

☑️ Implementation Phase

☑️ Configuration Phase

☑️ Execution Phase

☑️ Analysis Phase

☑️ Iteration Phase

Troubleshooting

Azure OpenAI Connection Issues

Deployment Not Found

Evolution Not Improving

Evaluation Failures

High Costs

Next Steps

Step 5.3: Write Initial Solution (`initial.py`)

Step 5.4: Write Evaluation Script (`evaluate.py`)

Step 5.5: Write Evolution Runner (`run_evo.py`)