Skip to content

NirliptaPande/llms_ftw

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

31 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

ARC Solver - Two-Phase LLM System

I have no idea if this up-to-date

Complete pipeline for solving ARC (Abstraction and Reasoning Corpus) tasks using a two-phase LLM approach with DSL-based code generation.

๐Ÿ—๏ธ System Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         PHASE 1                                 โ”‚
โ”‚                    Pattern Discovery                            โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”‚
โ”‚  โ”‚   Task   โ”‚โ”€โ”€โ”€โ–ถโ”‚   VLM    โ”‚โ”€โ”€โ”€โ–ถโ”‚  Pattern Analysis    โ”‚     โ”‚
โ”‚  โ”‚ Examples โ”‚    โ”‚ (Sonnet) โ”‚    โ”‚  (DSL operations)    โ”‚     โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ”‚
โ”‚                                             โ”‚                   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                              โ”‚
                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                    โ”‚  LIBRARY SEARCH         โ–ผ               โ”‚
                    โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
                    โ”‚  โ”‚  Extract keywords from pattern   โ”‚   โ”‚
                    โ”‚  โ”‚  Search solvers.py for similar   โ”‚   โ”‚
                    โ”‚  โ”‚  Test library programs           โ”‚   โ”‚
                    โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
                    โ”‚             โ”‚          โ”‚                 โ”‚
                    โ”‚     Perfect Match?  Top-K Similar       โ”‚
                    โ”‚             โ”‚          โ”‚                 โ”‚
                    โ”‚            YES        NO                 โ”‚
                    โ”‚             โ”‚          โ”‚                 โ”‚
                    โ”‚           DONE         โ–ผ                 โ”‚
                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                             โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         PHASE 2            โ–ผ                   โ”‚
โ”‚                    Code Generation                             โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚ Pattern +    โ”‚โ”€โ”€โ”€โ–ถโ”‚   VLM    โ”‚โ”€โ”€โ”€โ–ถโ”‚  Python Code       โ”‚  โ”‚
โ”‚  โ”‚ Similar Progsโ”‚    โ”‚ (Haiku)  โ”‚    โ”‚  def solve(I): ... โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                                             โ”‚                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                              โ”‚
                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                    โ”‚  TEST & EVALUATE        โ–ผ           โ”‚
                    โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
                    โ”‚  โ”‚  Execute on training examplesโ”‚   โ”‚
                    โ”‚  โ”‚  Calculate hamming distance  โ”‚   โ”‚
                    โ”‚  โ”‚  Compare with library        โ”‚   โ”‚
                    โ”‚  โ”‚  Select best solution        โ”‚   โ”‚
                    โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
                    โ”‚             โ”‚                        โ”‚
                    โ”‚      Perfect Score?                  โ”‚
                    โ”‚             โ”‚                        โ”‚
                    โ”‚            YES                       โ”‚
                    โ”‚             โ”‚                        โ”‚
                    โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”            โ”‚
                    โ”‚  โ”‚  Add to Library     โ”‚            โ”‚
                    โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜            โ”‚
                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ“ฆ Components

Core Files

  1. main.py - Main orchestration pipeline

    • Coordinates Phase 1 and Phase 2
    • Tests programs against training examples
    • Manages library search and fallback logic
  2. vlm_prompter.py - Prompt builder

    • build_phase1_prompt() - Pattern discovery prompt
    • build_phase2_prompt() - Code generation prompt
    • Includes full DSL reference (~1,500-2,500 tokens)
  3. vlm_client.py - API client

    • Handles Grok API calls
    • Retry logic with exponential backoff
    • Rate limiting support
  4. library.py - Program storage

    • ProgramLibrary class for storing solutions
    • Keyword-based similarity search
    • Jaccard similarity scoring
  5. dsl.py - Domain-Specific Language

    • 100+ primitives for grid transformations
    • Functional programming support
    • Object manipulation functions
  6. constants.py - DSL constants

    • Colors (ZERO-NINE)
    • Directions (UP, DOWN, LEFT, RIGHT)
    • Special values (T, F, ORIGIN)
  7. task_loader.py - Task I/O utilities

    • Load tasks from JSON files
    • Convert between list/tuple formats
    • Batch loading from directories
  8. solvers.py - Pre-solved tasks (your file)

    • Collection of solve_* functions
    • Automatically loaded into library

๐Ÿš€ Quick Start

1. Setup Environment

# Install dependencies
pip install requests

# Set API key
export GROK_API_KEY=your_grok_api_key_here

2. Prepare Your Files

Ensure you have:

your_project/
โ”œโ”€โ”€ main.py
โ”œโ”€โ”€ vlm_prompter.py
โ”œโ”€โ”€ vlm_client.py
โ”œโ”€โ”€ library.py
โ”œโ”€โ”€ dsl.py
โ”œโ”€โ”€ constants.py
โ”œโ”€โ”€ task_loader.py
โ”œโ”€โ”€ solvers.py          # Your existing solutions
โ””โ”€โ”€ tasks/              # Directory with ARC JSON files
    โ”œโ”€โ”€ task1.json
    โ”œโ”€โ”€ task2.json
    โ””โ”€โ”€ ...

3. Run the Solver

Option A: Single Task (Programmatic)

from main import solve_task
from vlm_client import VLMClient
from vlm_prompter import VLMPrompter
from library import ProgramLibrary
import dsl

# Load DSL
with open('dsl.py', 'r') as f:
    dsl_globals = {}
    exec(f.read(), dsl_globals)

# Initialize
client = VLMClient()
prompter = VLMPrompter()
library = ProgramLibrary()

# Your task
task = {
    'train': [
        {
            'input': ((1, 2), (3, 4)),
            'output': ((2, 1), (4, 3))
        }
    ]
}

# Solve
result = solve_task(
    task=task,
    task_id='my_task',
    vlm_client=client,
    prompter=prompter,
    library=library,
    dsl_globals=dsl_globals,
    verbose=True
)

print(f"Success: {result.success}")
print(f"Score: {result.score:.2f}")
print(f"Program:\n{result.program}")

Option B: Batch Processing

from task_loader import load_tasks_from_directory
from main import solve_task, load_solvers
# ... (same initialization as above)

# Load existing solutions
load_solvers('solvers.py', library, dsl_globals)

# Load all tasks
tasks = load_tasks_from_directory('tasks/')

# Solve each task
results = {}
for task_id, task in tasks.items():
    result = solve_task(
        task=task,
        task_id=task_id,
        vlm_client=client,
        prompter=prompter,
        library=library,
        dsl_globals=dsl_globals,
        verbose=True
    )
    results[task_id] = result

# Summary
solved = sum(1 for r in results.values() if r.success)
print(f"\nโœ… Solved {solved}/{len(results)} tasks")

๐Ÿ“Š How It Works

Phase 1: Pattern Discovery (~1,561 tokens)

Input: Training examples (input/output pairs)

Process:

  1. LLM analyzes each example sequentially
  2. Identifies transformations in DSL terms
  3. Synthesizes final pattern with operations

Output: Structured pattern analysis

<final_pattern>
SIZE: (h,w) โ†’ (2h,w)
OPS:
x1 = hmirror(I)
O = vconcat(I, x1)
LOGIC: stack horizontally mirrored version below original
CONDITIONS: none
</final_pattern>

Library Search

Process:

  1. Extract DSL function keywords from Phase 1 output
  2. Search solvers.py using Jaccard similarity
  3. Test top-K similar programs on training examples
  4. If perfect match found (score=1.0), return immediately

Phase 2: Code Generation (~2,497 tokens)

Input:

  • Phase 1 pattern analysis
  • Top-5 similar programs from library

Process:

  1. LLM generates Python code using DSL primitives
  2. Includes functional programming patterns
  3. Follows solve(I) function signature

Output: Python code

def solve(I):
    # Mirror horizontally
    x1 = hmirror(I)
    
    # Stack vertically
    O = vconcat(I, x1)
    
    return O

Testing & Evaluation

Process:

  1. Execute generated code on training examples
  2. Calculate hamming distance (per-cell comparison)
  3. Compute similarity score (1.0 = perfect match)
  4. Compare with best library program
  5. Select highest-scoring solution

Scoring:

  • 1.0 = Perfect match (all cells identical)
  • 0.8 = 80% of cells match
  • 0.0 = Completely different

Library Update

If score = 1.0:

  • Add solution to library
  • Available for future similarity searches

๐Ÿ”ง Configuration

VLM Settings

from vlm_client import VLMConfig

config = VLMConfig(
    api_key="your_key",
    model="grok-beta",  # or "claude-sonnet-3.5"
    max_tokens=4096,
    temperature=0.7,  # 0.0 for Phase 1, 0.7 for Phase 2
    max_retries=3
)

client = VLMClient(config)

Library Settings

# Change number of similar programs
similar_programs = library.find_similar(keywords, top_k=3)  # Default: 5

๐Ÿ“ˆ Performance Optimization

Early Stopping

The system stops early when:

  1. โœ… Library has perfect match (score=1.0)
  2. โœ… Generated code scores 1.0
  3. โญ๏ธ Skip Phase 2 if library match is perfect

Token Efficiency

Component Tokens % of Context
Phase 1 ~1,561 0.8%
Phase 2 ~2,497 1.2%
Total ~4,058 2.0%

Cost Per Task

With Grok API (~$3/1M tokens):

  • Phase 1: ~$0.0047
  • Phase 2: ~$0.0075
  • Total: ~$0.012 per task

๐Ÿ› Debugging

Enable Verbose Output

result = solve_task(..., verbose=True)

Shows:

  • Phase 1 completion
  • Library search results
  • Program test scores
  • Final decision logic

Common Issues

Issue: "GROK_API_KEY environment variable not set"

export GROK_API_KEY=your_key_here

Issue: "Failed to extract code from response"

  • Phase 2 output didn't contain valid Python code
  • Check LLM response format
  • May need to adjust temperature or prompt

Issue: "No similar programs found"

  • Library is empty or keywords don't match
  • Normal for first few tasks
  • Library will grow as you solve more

๐Ÿ“š DSL Reference

Most Common Functions

# Transforms
hmirror(grid)          # horizontal flip
vmirror(grid)          # vertical flip
rot90/180/270(grid)    # rotations

# Composition
vconcat(a, b)          # stack vertically
hconcat(a, b)          # stack horizontally

# Objects
objects(grid, T, F, T) # find connected regions
colorfilter(objs, c)   # filter by color
argmax(objs, size)     # largest object

# Functional
compose(f, g)          # f(g(x))
chain(f, g, h)         # f(g(h(x)))
fork(combine, f, g)    # combine(f(x), g(x))
rbind(func, arg)       # partial application

Full reference in dsl.py (100+ functions)

๐Ÿ”ฎ Future Enhancements

Phase 2b: Evolution (Not Yet Implemented)

# Planned mutation strategies:
1. Function substitution (hmirror โ†” vmirror)
2. Parameter tweaking (objects(I, T, F, T) โ†’ objects(I, T, T, T))
3. Add/remove steps
4. Control flow changes
5. LLM-guided repair with error feedback

Semantic Library Search

Replace keyword matching with embeddings:

from sentence_transformers import SentenceTransformer

library.find_similar_semantic(pattern, top_k=5)

Test-Time Compute

Allocate more retries for hard tasks:

solve_task(..., max_attempts=10, adaptive_budget=True)

๐Ÿ“„ License

Your project - use as you wish!

๐Ÿค Contributing

This is your personal ARC solver. Customize as needed!


Ready to solve some ARC tasks! ๐ŸŽฏ

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors