This project implements a Constraint Satisfaction Problem (CSP) solver for logic grid puzzles (Zebra puzzles) as part of the AI Connect 2025 collaborative challenge hosted by HSBI (Germany), TDU (TΓΌrkiye), SEECS/NUST (Pakistan), and CST/RUB (Bhutan).
The solver uses symbolic reasoning (not machine learning) to solve logic puzzles through constraint propagation and intelligent search strategies.
- Build a CSP solver that combines backtracking search with advanced heuristics
- Solve logic grid puzzles from the ZebraLogicBench dataset
- Compete on: Accuracy, Efficiency (search steps), and Generalization
Composite Score = Accuracy (%) - Ξ± Γ (AvgSteps / MaxAvgSteps)
Where:
- Accuracy (%) = Percentage of puzzles solved correctly
- AvgSteps = Average number of CSP search steps per puzzle
- MaxAvgSteps = Maximum average across all teams
- Ξ± = 10 = Efficiency penalty weight
ZebraLogicBench from AllenAI
-
grid_mode (1,000 puzzles)
- Full logic grid puzzles with complete solutions
- Structured as houses Γ features grids
- Example: 5 houses Γ 6 features (name, color, pet, drink, job, nationality)
-
mc_mode (3,259 questions)
- Multiple-choice questions derived from logic puzzles
- Single correct answer among choices
-
Constraint Representation
- Parse natural language clues into formal constraints
- Support various constraint types (equals, not-equals, adjacency, ordering)
-
Search Algorithm
- Backtracking search with intelligent pruning
- MRV (Minimum Remaining Values) heuristic for variable selection
- Degree heuristic as tie-breaker
- Least Constraining Value for value ordering
-
Constraint Propagation
- Forward checking to eliminate inconsistent values
- Arc consistency (AC-3) for early detection of dead ends
- Constraint propagation after each assignment
-
Trace Generation
- Logs search states, decisions, and backtracks
- Records domain sizes and constraint checks
- Enables performance analysis
ai connect/
βββ README.md # This file
βββ data_README.md # Dataset documentation
βββ requirements.txt # Python dependencies
βββ solver.py # Main CSP solver implementation
βββ data_parser.py # Parse puzzles into CSP format
βββ clue_parser.py # Natural language clue parsing
βββ trace_generator.py # Search trace logging
βββ validator.py # Solution validation against ground truth
βββ run.py # Script to run solver on test set
βββ compare_results.py # Compare results with ground truth
βββ extract_answers.py # Extract answers from dataset
βββ evaluate.ipynb # Evaluation notebook
βββ utils.py # Helper utilities
βββ grid_results.json # Grid mode output (generated)
βββ mc_results.json # MC mode output (generated)
βββ data/ # Dataset files (download separately)
βββ grid_mode/
β βββ answers.json # Ground truth (generated)
βββ mc_mode/
βββ answers.json # Ground truth (generated)
pip install -r requirements.txtBefore running the solver, extract ground truth answers for validation:
python extract_answers.pyThis creates:
data/grid_mode/answers.json- Grid mode ground truth (header + rows format)data/mc_mode/answers.json- MC mode ground truth (answer strings)
# Download from Hugging Face
# https://huggingface.co/datasets/allenai/ZebraLogicBench
# Or use datasets library
python -c "from datasets import load_dataset; \
dataset = load_dataset('allenai/ZebraLogicBench'); \
dataset['grid_mode'].save_to_disk('data/grid_mode'); \
dataset['mc_mode'].save_to_disk('data/mc_mode')"python run.py --mode grid --input data/grid_mode/test.parquet --output grid_results.jsonjupyter notebook evaluate.ipynb- Equality: "Alice owns the cat" β Alice.pet = cat
- Inequality: "Bob doesn't live in the red house" β Bob.color β red
- Adjacency: "The green house is next to the white house" β |green.position - white.position| = 1
- Left/Right: "The green house is to the left of the white house" β green.position < white.position
- Ordering: "The German lives in the first house" β German.position = 1
-
Variable Ordering (MRV)
- Select variable with fewest remaining values
- Fail-first principle for early pruning
-
Value Ordering
- Choose values that rule out fewest choices for remaining variables
- Maximizes future options
-
Constraint Propagation
- AC-3 algorithm for arc consistency
- Forward checking after each assignment
- Early detection of conflicts
-
Backtracking
- Intelligent backjumping when conflicts detected
- Track conflict sets for efficiency
The solver tracks:
- Accuracy: Percentage of correctly solved puzzles
- Search Steps: Number of variable assignments attempted
- Backtracks: Number of times search had to backtrack
- Time: Total solving time per puzzle
- Constraint Checks: Number of constraint evaluations
Before Fixes (Original):
- Grid: 63.05% accuracy (attribute name mismatches)
- MC: 29.19% accuracy
After Fixes (Latest):
- β Fixed attribute name mapping (PersonβName, PetβAnimal, BookβBookGenre)
- β Increased solver limits (100k backtracks, 30s timeout)
- π Re-run needed to measure new accuracy (expect ~90%+ for grid mode)
Grid Mode:
- Puzzles solved: 977/1000 (97.7%)
- Accuracy: Run
python compare_results.pyafter re-running solver
MC Mode:
- Questions answered: 3162/3259 (97.0%)
- Accuracy: Run
python compare_results.pyafter re-running solver
Use compare_results.py to see detailed mismatches:
python compare_results.pyThis shows:
- Total accuracy for both modes
- Sample mismatches with expected vs actual values
- House-by-house attribute comparisons
- β solver.py - Core CSP solver
- β data_parser.py - Puzzle parsing
- β trace_generator.py - Search logging
- β run.py - Test execution script
- β evaluate.ipynb - Evaluation notebook
- β grid_results.json - Test set solutions
- β README.md - Project documentation
- πΉ Video - 2-minute demonstration (to be recorded)
Solutions are stored in the same format as ground truth for easy comparison:
{
"lgp-test-5x6-16": {
"header": ["House", "Name", "Nationality", "BookGenre", "Food", "Color", "Animal"],
"rows": [
["1", "Bob", "german", "mystery", "grilled cheese", "yellow", "dog"],
["2", "Eric", "norwegian", "fantasy", "stew", "blue", "fish"],
["3", "Peter", "dane", "science fiction", "spaghetti", "green", "cat"],
["4", "Arnold", "swede", "biography", "stir fry", "red", "bird"],
["5", "Alice", "brit", "romance", "pizza", "white", "horse"]
]
}
}Multiple choice answers are stored as simple key-value pairs:
{
"lgp-test-5x5-18#mc-0": "Peter",
"lgp-test-4x2-35#mc-0": "Alice",
"lgp-test-5x2-24#mc-1": "lime"
}The validator compares solver outputs with ground truth answers:
- Extract Ground Truth: Use
extract_answers.pyto generate answer files from the dataset - Format Matching: Solutions are converted to header+rows format for consistent comparison
- Normalization: Values are normalized (lowercase, trimmed) for case-insensitive comparison
- House-by-House Validation: Each house's attributes are compared individually
- Detailed Error Reporting: Mismatches show expected vs actual values
# Extract ground truth answers from dataset
python extract_answers.py
# Run solver and compare with ground truth
python run.py --mode grid --input data/grid_mode/test.parquet --output grid_results.json
# Compare results with ground truth in detail
python compare_results.pyThe validator automatically handles both output formats:
- Old format:
{"House1": {"Color": "red", ...}, "House2": {...}} - New format:
{"header": ["House", "Color", ...], "rows": [["1", "red", ...], ...]}
Both are normalized to lowercase for case-insensitive comparison.
def backtracking_search(csp):
if assignment is complete:
return assignment
var = select_unassigned_variable(csp) # MRV heuristic
for value in order_domain_values(var, csp): # LCV heuristic
if is_consistent(var, value, assignment, csp):
assign(var, value)
inferences = forward_check(var, value, csp)
if inferences != failure:
result = backtracking_search(csp)
if result != failure:
return result
remove_assignment(var, value)
restore_domains(inferences)
return failureInput Clues:
- There are 3 houses
- The Englishman lives in the red house
- The Swede has a dog
- The Dane drinks tea
- The green house is to the left of the white house ...
CSP Representation:
- Variables: House1, House2, House3
- Domains: {(English, Red, Dog, Tea), (Swedish, Green, Cat, Coffee), ...}
- Constraints: Englishman.color = Red, Swede.pet = Dog, ...
- Team Size: 6-8 members from multiple countries
- Duration: 11 days
- Communication: Establish clear ways of working
- Deliverable: 2-minute video showcasing solution
If you see mismatches between solver output and ground truth:
- Check Format Consistency: Ensure
format_grid_solution()generates header+rows format - Verify Normalization: All values should be lowercase and trimmed
- Attribute Ordering: Attributes should be sorted alphabetically in headers
- Missing Values: Empty attributes should be empty strings, not None
- "Our solution missing header/rows format": The solver didn't format the solution correctly
- "Mismatches: House1.color: expected 'red', got 'blue'": Constraint logic error
- Low accuracy: Check constraint parsing and CSP variable assignments
# Run with trace enabled to see search steps
python run.py --mode grid --input data/grid_mode/test.parquet --trace --trace-file trace.json
# Limit puzzles for quick testing
python run.py --mode grid --input data/grid_mode/test.parquet --limit 10- Russell, S., & Norvig, P. (2020). Artificial Intelligence: A Modern Approach (4th ed.)
- Mackworth, A. K. (1977). Consistency in networks of relations
- Haralick, R. M., & Elliott, G. L. (1980). Increasing tree search efficiency for constraint satisfaction problems
- ZebraLogicBench: https://huggingface.co/datasets/allenai/ZebraLogicBench
Current performance: 63% grid accuracy, 29% MC accuracy
Main issues identified (run python analyze_errors.py):
-
Attribute name mismatches (70% of errors):
- Ground truth uses: Name, Animal, BookGenre
- Solver produces: Person, Pet, Book
- Fix: Add attribute name mapping in
format_grid_solution()
-
Value swaps (values correct but wrong houses):
- Indicates weak positional constraints
- Fix: Strengthen position-based constraints in
clue_parser.py
-
Unsolved puzzles (23/1000 grid, 97/3259 MC):
- Timeout or constraint conflicts
- Fix: Increase
max_backtracksandtimeoutinrun.py
Quick wins to improve accuracy:
- Fix attribute name normalization
- Parse more clue patterns
- Add stronger positional constraints
- Increase solver limits
This project is submitted as part of the AI Connect 2025 educational challenge.
[Team Name - Add your team members here]
Good luck and happy solving! π§ π