Complete pipeline for solving ARC (Abstraction and Reasoning Corpus) tasks using a two-phase LLM approach with DSL-based code generation.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PHASE 1 โ
โ Pattern Discovery โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Task โโโโโถโ VLM โโโโโถโ Pattern Analysis โ โ
โ โ Examples โ โ (Sonnet) โ โ (DSL operations) โ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโ
โ LIBRARY SEARCH โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Extract keywords from pattern โ โ
โ โ Search solvers.py for similar โ โ
โ โ Test library programs โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ โ
โ Perfect Match? Top-K Similar โ
โ โ โ โ
โ YES NO โ
โ โ โ โ
โ DONE โผ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโ
โ PHASE 2 โผ โ
โ Code Generation โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Pattern + โโโโโถโ VLM โโโโโถโ Python Code โ โ
โ โ Similar Progsโ โ (Haiku) โ โ def solve(I): ... โ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโ
โ TEST & EVALUATE โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Execute on training examplesโ โ
โ โ Calculate hamming distance โ โ
โ โ Compare with library โ โ
โ โ Select best solution โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ Perfect Score? โ
โ โ โ
โ YES โ
โ โ โ
โ โโโโโโโโโโโโผโโโโโโโโโโโ โ
โ โ Add to Library โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
-
main.py- Main orchestration pipeline- Coordinates Phase 1 and Phase 2
- Tests programs against training examples
- Manages library search and fallback logic
-
vlm_prompter.py- Prompt builderbuild_phase1_prompt()- Pattern discovery promptbuild_phase2_prompt()- Code generation prompt- Includes full DSL reference (~1,500-2,500 tokens)
-
vlm_client.py- API client- Handles Grok API calls
- Retry logic with exponential backoff
- Rate limiting support
-
library.py- Program storageProgramLibraryclass for storing solutions- Keyword-based similarity search
- Jaccard similarity scoring
-
dsl.py- Domain-Specific Language- 100+ primitives for grid transformations
- Functional programming support
- Object manipulation functions
-
constants.py- DSL constants- Colors (ZERO-NINE)
- Directions (UP, DOWN, LEFT, RIGHT)
- Special values (T, F, ORIGIN)
-
task_loader.py- Task I/O utilities- Load tasks from JSON files
- Convert between list/tuple formats
- Batch loading from directories
-
solvers.py- Pre-solved tasks (your file)- Collection of
solve_*functions - Automatically loaded into library
- Collection of
# Install dependencies
pip install requests
# Set API key
export GROK_API_KEY=your_grok_api_key_hereEnsure you have:
your_project/
โโโ main.py
โโโ vlm_prompter.py
โโโ vlm_client.py
โโโ library.py
โโโ dsl.py
โโโ constants.py
โโโ task_loader.py
โโโ solvers.py # Your existing solutions
โโโ tasks/ # Directory with ARC JSON files
โโโ task1.json
โโโ task2.json
โโโ ...
from main import solve_task
from vlm_client import VLMClient
from vlm_prompter import VLMPrompter
from library import ProgramLibrary
import dsl
# Load DSL
with open('dsl.py', 'r') as f:
dsl_globals = {}
exec(f.read(), dsl_globals)
# Initialize
client = VLMClient()
prompter = VLMPrompter()
library = ProgramLibrary()
# Your task
task = {
'train': [
{
'input': ((1, 2), (3, 4)),
'output': ((2, 1), (4, 3))
}
]
}
# Solve
result = solve_task(
task=task,
task_id='my_task',
vlm_client=client,
prompter=prompter,
library=library,
dsl_globals=dsl_globals,
verbose=True
)
print(f"Success: {result.success}")
print(f"Score: {result.score:.2f}")
print(f"Program:\n{result.program}")from task_loader import load_tasks_from_directory
from main import solve_task, load_solvers
# ... (same initialization as above)
# Load existing solutions
load_solvers('solvers.py', library, dsl_globals)
# Load all tasks
tasks = load_tasks_from_directory('tasks/')
# Solve each task
results = {}
for task_id, task in tasks.items():
result = solve_task(
task=task,
task_id=task_id,
vlm_client=client,
prompter=prompter,
library=library,
dsl_globals=dsl_globals,
verbose=True
)
results[task_id] = result
# Summary
solved = sum(1 for r in results.values() if r.success)
print(f"\nโ
Solved {solved}/{len(results)} tasks")Input: Training examples (input/output pairs)
Process:
- LLM analyzes each example sequentially
- Identifies transformations in DSL terms
- Synthesizes final pattern with operations
Output: Structured pattern analysis
<final_pattern>
SIZE: (h,w) โ (2h,w)
OPS:
x1 = hmirror(I)
O = vconcat(I, x1)
LOGIC: stack horizontally mirrored version below original
CONDITIONS: none
</final_pattern>
Process:
- Extract DSL function keywords from Phase 1 output
- Search
solvers.pyusing Jaccard similarity - Test top-K similar programs on training examples
- If perfect match found (score=1.0), return immediately
Input:
- Phase 1 pattern analysis
- Top-5 similar programs from library
Process:
- LLM generates Python code using DSL primitives
- Includes functional programming patterns
- Follows solve(I) function signature
Output: Python code
def solve(I):
# Mirror horizontally
x1 = hmirror(I)
# Stack vertically
O = vconcat(I, x1)
return OProcess:
- Execute generated code on training examples
- Calculate hamming distance (per-cell comparison)
- Compute similarity score (1.0 = perfect match)
- Compare with best library program
- Select highest-scoring solution
Scoring:
1.0= Perfect match (all cells identical)0.8= 80% of cells match0.0= Completely different
If score = 1.0:
- Add solution to library
- Available for future similarity searches
from vlm_client import VLMConfig
config = VLMConfig(
api_key="your_key",
model="grok-beta", # or "claude-sonnet-3.5"
max_tokens=4096,
temperature=0.7, # 0.0 for Phase 1, 0.7 for Phase 2
max_retries=3
)
client = VLMClient(config)# Change number of similar programs
similar_programs = library.find_similar(keywords, top_k=3) # Default: 5The system stops early when:
- โ Library has perfect match (score=1.0)
- โ Generated code scores 1.0
- โญ๏ธ Skip Phase 2 if library match is perfect
| Component | Tokens | % of Context |
|---|---|---|
| Phase 1 | ~1,561 | 0.8% |
| Phase 2 | ~2,497 | 1.2% |
| Total | ~4,058 | 2.0% |
With Grok API (~$3/1M tokens):
- Phase 1: ~$0.0047
- Phase 2: ~$0.0075
- Total: ~$0.012 per task
result = solve_task(..., verbose=True)Shows:
- Phase 1 completion
- Library search results
- Program test scores
- Final decision logic
Issue: "GROK_API_KEY environment variable not set"
export GROK_API_KEY=your_key_hereIssue: "Failed to extract code from response"
- Phase 2 output didn't contain valid Python code
- Check LLM response format
- May need to adjust temperature or prompt
Issue: "No similar programs found"
- Library is empty or keywords don't match
- Normal for first few tasks
- Library will grow as you solve more
# Transforms
hmirror(grid) # horizontal flip
vmirror(grid) # vertical flip
rot90/180/270(grid) # rotations
# Composition
vconcat(a, b) # stack vertically
hconcat(a, b) # stack horizontally
# Objects
objects(grid, T, F, T) # find connected regions
colorfilter(objs, c) # filter by color
argmax(objs, size) # largest object
# Functional
compose(f, g) # f(g(x))
chain(f, g, h) # f(g(h(x)))
fork(combine, f, g) # combine(f(x), g(x))
rbind(func, arg) # partial applicationFull reference in dsl.py (100+ functions)
# Planned mutation strategies:
1. Function substitution (hmirror โ vmirror)
2. Parameter tweaking (objects(I, T, F, T) โ objects(I, T, T, T))
3. Add/remove steps
4. Control flow changes
5. LLM-guided repair with error feedbackReplace keyword matching with embeddings:
from sentence_transformers import SentenceTransformer
library.find_similar_semantic(pattern, top_k=5)Allocate more retries for hard tasks:
solve_task(..., max_attempts=10, adaptive_budget=True)Your project - use as you wish!
This is your personal ARC solver. Customize as needed!
Ready to solve some ARC tasks! ๐ฏ