AI Connect 2025 - CSP Solver for Logic Grid Puzzles

🎯 Project Overview

This project implements a Constraint Satisfaction Problem (CSP) solver for logic grid puzzles (Zebra puzzles) as part of the AI Connect 2025 collaborative challenge hosted by HSBI (Germany), TDU (Türkiye), SEECS/NUST (Pakistan), and CST/RUB (Bhutan).

The solver uses symbolic reasoning (not machine learning) to solve logic puzzles through constraint propagation and intelligent search strategies.

🏆 Challenge Objectives

Build a CSP solver that combines backtracking search with advanced heuristics
Solve logic grid puzzles from the ZebraLogicBench dataset
Compete on: Accuracy, Efficiency (search steps), and Generalization

Evaluation Metric

Composite Score = Accuracy (%) - α × (AvgSteps / MaxAvgSteps)

Where:

Accuracy (%) = Percentage of puzzles solved correctly
AvgSteps = Average number of CSP search steps per puzzle
MaxAvgSteps = Maximum average across all teams
α = 10 = Efficiency penalty weight

📊 Dataset

ZebraLogicBench from AllenAI

Two Modes:

grid_mode (1,000 puzzles)
- Full logic grid puzzles with complete solutions
- Structured as houses × features grids
- Example: 5 houses × 6 features (name, color, pet, drink, job, nationality)
mc_mode (3,259 questions)
- Multiple-choice questions derived from logic puzzles
- Single correct answer among choices

🔧 Technical Approach

Core CSP Solver Components

Constraint Representation
- Parse natural language clues into formal constraints
- Support various constraint types (equals, not-equals, adjacency, ordering)
Search Algorithm
- Backtracking search with intelligent pruning
- MRV (Minimum Remaining Values) heuristic for variable selection
- Degree heuristic as tie-breaker
- Least Constraining Value for value ordering
Constraint Propagation
- Forward checking to eliminate inconsistent values
- Arc consistency (AC-3) for early detection of dead ends
- Constraint propagation after each assignment
Trace Generation
- Logs search states, decisions, and backtracks
- Records domain sizes and constraint checks
- Enables performance analysis

📁 Project Structure

ai connect/
├── README.md                 # This file
├── data_README.md           # Dataset documentation
├── requirements.txt         # Python dependencies
├── solver.py                # Main CSP solver implementation
├── data_parser.py           # Parse puzzles into CSP format
├── clue_parser.py           # Natural language clue parsing
├── trace_generator.py       # Search trace logging
├── validator.py             # Solution validation against ground truth
├── run.py                   # Script to run solver on test set
├── compare_results.py       # Compare results with ground truth
├── extract_answers.py       # Extract answers from dataset
├── evaluate.ipynb           # Evaluation notebook
├── utils.py                 # Helper utilities
├── grid_results.json        # Grid mode output (generated)
├── mc_results.json          # MC mode output (generated)
└── data/                    # Dataset files (download separately)
    ├── grid_mode/
    │   └── answers.json     # Ground truth (generated)
    └── mc_mode/
        └── answers.json     # Ground truth (generated)

🚀 Getting Started

1. Install Dependencies

pip install -r requirements.txt

2. Extract Ground Truth Answers

Before running the solver, extract ground truth answers for validation:

python extract_answers.py

This creates:

data/grid_mode/answers.json - Grid mode ground truth (header + rows format)
data/mc_mode/answers.json - MC mode ground truth (answer strings)

3. Download Dataset

# Download from Hugging Face
# https://huggingface.co/datasets/allenai/ZebraLogicBench

# Or use datasets library
python -c "from datasets import load_dataset; \
dataset = load_dataset('allenai/ZebraLogicBench'); \
dataset['grid_mode'].save_to_disk('data/grid_mode'); \
dataset['mc_mode'].save_to_disk('data/mc_mode')"

3. Run Solver on Test Set

python run.py --mode grid --input data/grid_mode/test.parquet --output grid_results.json

4. Evaluate Results

jupyter notebook evaluate.ipynb

💡 Implementation Details

Constraint Types Supported

Equality: "Alice owns the cat" → Alice.pet = cat
Inequality: "Bob doesn't live in the red house" → Bob.color ≠ red
Adjacency: "The green house is next to the white house" → |green.position - white.position| = 1
Left/Right: "The green house is to the left of the white house" → green.position < white.position
Ordering: "The German lives in the first house" → German.position = 1

Search Optimizations

Variable Ordering (MRV)
- Select variable with fewest remaining values
- Fail-first principle for early pruning
Value Ordering
- Choose values that rule out fewest choices for remaining variables
- Maximizes future options
Constraint Propagation
- AC-3 algorithm for arc consistency
- Forward checking after each assignment
- Early detection of conflicts
Backtracking
- Intelligent backjumping when conflicts detected
- Track conflict sets for efficiency

📈 Performance Metrics

The solver tracks:

Accuracy: Percentage of correctly solved puzzles
Search Steps: Number of variable assignments attempted
Backtracks: Number of times search had to backtrack
Time: Total solving time per puzzle
Constraint Checks: Number of constraint evaluations

Current Performance

Before Fixes (Original):

Grid: 63.05% accuracy (attribute name mismatches)
MC: 29.19% accuracy

After Fixes (Latest):

✅ Fixed attribute name mapping (Person→Name, Pet→Animal, Book→BookGenre)
✅ Increased solver limits (100k backtracks, 30s timeout)
📊 Re-run needed to measure new accuracy (expect ~90%+ for grid mode)

Grid Mode:

Puzzles solved: 977/1000 (97.7%)
Accuracy: Run python compare_results.py after re-running solver

MC Mode:

Questions answered: 3162/3259 (97.0%)
Accuracy: Run python compare_results.py after re-running solver

Performance Analysis

Use compare_results.py to see detailed mismatches:

python compare_results.py

This shows:

Total accuracy for both modes
Sample mismatches with expected vs actual values
House-by-house attribute comparisons

🎥 Deliverables

✅ solver.py - Core CSP solver
✅ data_parser.py - Puzzle parsing
✅ trace_generator.py - Search logging
✅ run.py - Test execution script
✅ evaluate.ipynb - Evaluation notebook
✅ grid_results.json - Test set solutions
✅ README.md - Project documentation
📹 Video - 2-minute demonstration (to be recorded)

📝 Results Format

Grid Mode Output Format

Solutions are stored in the same format as ground truth for easy comparison:

{
  "lgp-test-5x6-16": {
    "header": ["House", "Name", "Nationality", "BookGenre", "Food", "Color", "Animal"],
    "rows": [
      ["1", "Bob", "german", "mystery", "grilled cheese", "yellow", "dog"],
      ["2", "Eric", "norwegian", "fantasy", "stew", "blue", "fish"],
      ["3", "Peter", "dane", "science fiction", "spaghetti", "green", "cat"],
      ["4", "Arnold", "swede", "biography", "stir fry", "red", "bird"],
      ["5", "Alice", "brit", "romance", "pizza", "white", "horse"]
    ]
  }
}

MC Mode Output Format

Multiple choice answers are stored as simple key-value pairs:

{
  "lgp-test-5x5-18#mc-0": "Peter",
  "lgp-test-4x2-35#mc-0": "Alice",
  "lgp-test-5x2-24#mc-1": "lime"
}

✅ Validation

The validator compares solver outputs with ground truth answers:

Extract Ground Truth: Use extract_answers.py to generate answer files from the dataset
Format Matching: Solutions are converted to header+rows format for consistent comparison
Normalization: Values are normalized (lowercase, trimmed) for case-insensitive comparison
House-by-House Validation: Each house's attributes are compared individually
Detailed Error Reporting: Mismatches show expected vs actual values

Running Validation

# Extract ground truth answers from dataset
python extract_answers.py

# Run solver and compare with ground truth
python run.py --mode grid --input data/grid_mode/test.parquet --output grid_results.json

# Compare results with ground truth in detail
python compare_results.py

The validator automatically handles both output formats:

Old format: {"House1": {"Color": "red", ...}, "House2": {...}}
New format: {"header": ["House", "Color", ...], "rows": [["1", "red", ...], ...]}

Both are normalized to lowercase for case-insensitive comparison.

🔬 Algorithm Pseudocode

def backtracking_search(csp):
    if assignment is complete:
        return assignment
    
    var = select_unassigned_variable(csp)  # MRV heuristic
    
    for value in order_domain_values(var, csp):  # LCV heuristic
        if is_consistent(var, value, assignment, csp):
            assign(var, value)
            inferences = forward_check(var, value, csp)
            
            if inferences != failure:
                result = backtracking_search(csp)
                if result != failure:
                    return result
            
            remove_assignment(var, value)
            restore_domains(inferences)
    
    return failure

🧩 Example Puzzle

Input Clues:

There are 3 houses
The Englishman lives in the red house
The Swede has a dog
The Dane drinks tea
The green house is to the left of the white house ...

CSP Representation:

Variables: House1, House2, House3
Domains: {(English, Red, Dog, Tea), (Swedish, Green, Cat, Coffee), ...}
Constraints: Englishman.color = Red, Swede.pet = Dog, ...

🤝 Team Collaboration

Team Size: 6-8 members from multiple countries
Duration: 11 days
Communication: Establish clear ways of working
Deliverable: 2-minute video showcasing solution

� Troubleshooting

Validation Failures

If you see mismatches between solver output and ground truth:

Check Format Consistency: Ensure format_grid_solution() generates header+rows format
Verify Normalization: All values should be lowercase and trimmed
Attribute Ordering: Attributes should be sorted alphabetically in headers
Missing Values: Empty attributes should be empty strings, not None

Common Issues

"Our solution missing header/rows format": The solver didn't format the solution correctly
"Mismatches: House1.color: expected 'red', got 'blue'": Constraint logic error
Low accuracy: Check constraint parsing and CSP variable assignments

Debugging

# Run with trace enabled to see search steps
python run.py --mode grid --input data/grid_mode/test.parquet --trace --trace-file trace.json

# Limit puzzles for quick testing
python run.py --mode grid --input data/grid_mode/test.parquet --limit 10

�📚 References

Russell, S., & Norvig, P. (2020). Artificial Intelligence: A Modern Approach (4th ed.)
Mackworth, A. K. (1977). Consistency in networks of relations
Haralick, R. M., & Elliott, G. L. (1980). Increasing tree search efficiency for constraint satisfaction problems
ZebraLogicBench: https://huggingface.co/datasets/allenai/ZebraLogicBench

🎯 Improving Accuracy

Current performance: 63% grid accuracy, 29% MC accuracy

Main issues identified (run python analyze_errors.py):

Attribute name mismatches (70% of errors):
- Ground truth uses: Name, Animal, BookGenre
- Solver produces: Person, Pet, Book
- Fix: Add attribute name mapping in format_grid_solution()
Value swaps (values correct but wrong houses):
- Indicates weak positional constraints
- Fix: Strengthen position-based constraints in clue_parser.py
Unsolved puzzles (23/1000 grid, 97/3259 MC):
- Timeout or constraint conflicts
- Fix: Increase max_backtracks and timeout in run.py

Quick wins to improve accuracy:

Fix attribute name normalization
Parse more clue patterns
Add stronger positional constraints
Increase solver limits

📄 License

This project is submitted as part of the AI Connect 2025 educational challenge.

👥 Authors

[Team Name - Add your team members here]

Good luck and happy solving! 🧠🔍

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
puzzles		puzzles
.gitignore		.gitignore
AI_Connect_kuggle_README.md		AI_Connect_kuggle_README.md
Gridmode-00000-of-00001.parquet		Gridmode-00000-of-00001.parquet
README.md		README.md
Test_100_Puzzles.parquet		Test_100_Puzzles.parquet
analyze_errors.py		analyze_errors.py
analyze_solution.py		analyze_solution.py
clue_parser.py		clue_parser.py
compare_results.py		compare_results.py
data_parser.py		data_parser.py
evaluate.ipynb		evaluate.ipynb
extract_answers.py		extract_answers.py
gitattributes.txt		gitattributes.txt
mc-00000-of-00001.parquet		mc-00000-of-00001.parquet
requirements.txt		requirements.txt
run.py		run.py
setup.py		setup.py
solver.py		solver.py
trace.json		trace.json
trace_generator.py		trace_generator.py
utils.py		utils.py
validator.py		validator.py

Folders and files

Latest commit

History

Repository files navigation

AI Connect 2025 - CSP Solver for Logic Grid Puzzles

🎯 Project Overview

🏆 Challenge Objectives

Evaluation Metric

📊 Dataset

Two Modes:

🔧 Technical Approach

Core CSP Solver Components

📁 Project Structure

🚀 Getting Started

1. Install Dependencies

2. Extract Ground Truth Answers

3. Download Dataset

3. Run Solver on Test Set

4. Evaluate Results

💡 Implementation Details

Constraint Types Supported

Search Optimizations

📈 Performance Metrics

Current Performance

Performance Analysis

🎥 Deliverables

📝 Results Format

Grid Mode Output Format

MC Mode Output Format

✅ Validation

Running Validation

🔬 Algorithm Pseudocode

🧩 Example Puzzle

🤝 Team Collaboration

� Troubleshooting

Validation Failures

Common Issues

Debugging

�📚 References

🎯 Improving Accuracy

📄 License

👥 Authors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages