Skip to content
This repository was archived by the owner on Dec 23, 2025. It is now read-only.

sudo-muneeb/ai-connect-competition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AI Connect 2025 - CSP Solver for Logic Grid Puzzles

🎯 Project Overview

This project implements a Constraint Satisfaction Problem (CSP) solver for logic grid puzzles (Zebra puzzles) as part of the AI Connect 2025 collaborative challenge hosted by HSBI (Germany), TDU (TΓΌrkiye), SEECS/NUST (Pakistan), and CST/RUB (Bhutan).

The solver uses symbolic reasoning (not machine learning) to solve logic puzzles through constraint propagation and intelligent search strategies.

πŸ† Challenge Objectives

  • Build a CSP solver that combines backtracking search with advanced heuristics
  • Solve logic grid puzzles from the ZebraLogicBench dataset
  • Compete on: Accuracy, Efficiency (search steps), and Generalization

Evaluation Metric

Composite Score = Accuracy (%) - Ξ± Γ— (AvgSteps / MaxAvgSteps)

Where:

  • Accuracy (%) = Percentage of puzzles solved correctly
  • AvgSteps = Average number of CSP search steps per puzzle
  • MaxAvgSteps = Maximum average across all teams
  • Ξ± = 10 = Efficiency penalty weight

πŸ“Š Dataset

ZebraLogicBench from AllenAI

Two Modes:

  1. grid_mode (1,000 puzzles)

    • Full logic grid puzzles with complete solutions
    • Structured as houses Γ— features grids
    • Example: 5 houses Γ— 6 features (name, color, pet, drink, job, nationality)
  2. mc_mode (3,259 questions)

    • Multiple-choice questions derived from logic puzzles
    • Single correct answer among choices

πŸ”§ Technical Approach

Core CSP Solver Components

  1. Constraint Representation

    • Parse natural language clues into formal constraints
    • Support various constraint types (equals, not-equals, adjacency, ordering)
  2. Search Algorithm

    • Backtracking search with intelligent pruning
    • MRV (Minimum Remaining Values) heuristic for variable selection
    • Degree heuristic as tie-breaker
    • Least Constraining Value for value ordering
  3. Constraint Propagation

    • Forward checking to eliminate inconsistent values
    • Arc consistency (AC-3) for early detection of dead ends
    • Constraint propagation after each assignment
  4. Trace Generation

    • Logs search states, decisions, and backtracks
    • Records domain sizes and constraint checks
    • Enables performance analysis

πŸ“ Project Structure

ai connect/
β”œβ”€β”€ README.md                 # This file
β”œβ”€β”€ data_README.md           # Dataset documentation
β”œβ”€β”€ requirements.txt         # Python dependencies
β”œβ”€β”€ solver.py                # Main CSP solver implementation
β”œβ”€β”€ data_parser.py           # Parse puzzles into CSP format
β”œβ”€β”€ clue_parser.py           # Natural language clue parsing
β”œβ”€β”€ trace_generator.py       # Search trace logging
β”œβ”€β”€ validator.py             # Solution validation against ground truth
β”œβ”€β”€ run.py                   # Script to run solver on test set
β”œβ”€β”€ compare_results.py       # Compare results with ground truth
β”œβ”€β”€ extract_answers.py       # Extract answers from dataset
β”œβ”€β”€ evaluate.ipynb           # Evaluation notebook
β”œβ”€β”€ utils.py                 # Helper utilities
β”œβ”€β”€ grid_results.json        # Grid mode output (generated)
β”œβ”€β”€ mc_results.json          # MC mode output (generated)
└── data/                    # Dataset files (download separately)
    β”œβ”€β”€ grid_mode/
    β”‚   └── answers.json     # Ground truth (generated)
    └── mc_mode/
        └── answers.json     # Ground truth (generated)

πŸš€ Getting Started

1. Install Dependencies

pip install -r requirements.txt

2. Extract Ground Truth Answers

Before running the solver, extract ground truth answers for validation:

python extract_answers.py

This creates:

  • data/grid_mode/answers.json - Grid mode ground truth (header + rows format)
  • data/mc_mode/answers.json - MC mode ground truth (answer strings)

3. Download Dataset

# Download from Hugging Face
# https://huggingface.co/datasets/allenai/ZebraLogicBench

# Or use datasets library
python -c "from datasets import load_dataset; \
dataset = load_dataset('allenai/ZebraLogicBench'); \
dataset['grid_mode'].save_to_disk('data/grid_mode'); \
dataset['mc_mode'].save_to_disk('data/mc_mode')"

3. Run Solver on Test Set

python run.py --mode grid --input data/grid_mode/test.parquet --output grid_results.json

4. Evaluate Results

jupyter notebook evaluate.ipynb

πŸ’‘ Implementation Details

Constraint Types Supported

  • Equality: "Alice owns the cat" β†’ Alice.pet = cat
  • Inequality: "Bob doesn't live in the red house" β†’ Bob.color β‰  red
  • Adjacency: "The green house is next to the white house" β†’ |green.position - white.position| = 1
  • Left/Right: "The green house is to the left of the white house" β†’ green.position < white.position
  • Ordering: "The German lives in the first house" β†’ German.position = 1

Search Optimizations

  1. Variable Ordering (MRV)

    • Select variable with fewest remaining values
    • Fail-first principle for early pruning
  2. Value Ordering

    • Choose values that rule out fewest choices for remaining variables
    • Maximizes future options
  3. Constraint Propagation

    • AC-3 algorithm for arc consistency
    • Forward checking after each assignment
    • Early detection of conflicts
  4. Backtracking

    • Intelligent backjumping when conflicts detected
    • Track conflict sets for efficiency

πŸ“ˆ Performance Metrics

The solver tracks:

  • Accuracy: Percentage of correctly solved puzzles
  • Search Steps: Number of variable assignments attempted
  • Backtracks: Number of times search had to backtrack
  • Time: Total solving time per puzzle
  • Constraint Checks: Number of constraint evaluations

Current Performance

Before Fixes (Original):

  • Grid: 63.05% accuracy (attribute name mismatches)
  • MC: 29.19% accuracy

After Fixes (Latest):

  • βœ… Fixed attribute name mapping (Personβ†’Name, Petβ†’Animal, Bookβ†’BookGenre)
  • βœ… Increased solver limits (100k backtracks, 30s timeout)
  • πŸ“Š Re-run needed to measure new accuracy (expect ~90%+ for grid mode)

Grid Mode:

  • Puzzles solved: 977/1000 (97.7%)
  • Accuracy: Run python compare_results.py after re-running solver

MC Mode:

  • Questions answered: 3162/3259 (97.0%)
  • Accuracy: Run python compare_results.py after re-running solver

Performance Analysis

Use compare_results.py to see detailed mismatches:

python compare_results.py

This shows:

  • Total accuracy for both modes
  • Sample mismatches with expected vs actual values
  • House-by-house attribute comparisons

πŸŽ₯ Deliverables

  1. βœ… solver.py - Core CSP solver
  2. βœ… data_parser.py - Puzzle parsing
  3. βœ… trace_generator.py - Search logging
  4. βœ… run.py - Test execution script
  5. βœ… evaluate.ipynb - Evaluation notebook
  6. βœ… grid_results.json - Test set solutions
  7. βœ… README.md - Project documentation
  8. πŸ“Ή Video - 2-minute demonstration (to be recorded)

πŸ“ Results Format

Grid Mode Output Format

Solutions are stored in the same format as ground truth for easy comparison:

{
  "lgp-test-5x6-16": {
    "header": ["House", "Name", "Nationality", "BookGenre", "Food", "Color", "Animal"],
    "rows": [
      ["1", "Bob", "german", "mystery", "grilled cheese", "yellow", "dog"],
      ["2", "Eric", "norwegian", "fantasy", "stew", "blue", "fish"],
      ["3", "Peter", "dane", "science fiction", "spaghetti", "green", "cat"],
      ["4", "Arnold", "swede", "biography", "stir fry", "red", "bird"],
      ["5", "Alice", "brit", "romance", "pizza", "white", "horse"]
    ]
  }
}

MC Mode Output Format

Multiple choice answers are stored as simple key-value pairs:

{
  "lgp-test-5x5-18#mc-0": "Peter",
  "lgp-test-4x2-35#mc-0": "Alice",
  "lgp-test-5x2-24#mc-1": "lime"
}

βœ… Validation

The validator compares solver outputs with ground truth answers:

  1. Extract Ground Truth: Use extract_answers.py to generate answer files from the dataset
  2. Format Matching: Solutions are converted to header+rows format for consistent comparison
  3. Normalization: Values are normalized (lowercase, trimmed) for case-insensitive comparison
  4. House-by-House Validation: Each house's attributes are compared individually
  5. Detailed Error Reporting: Mismatches show expected vs actual values

Running Validation

# Extract ground truth answers from dataset
python extract_answers.py

# Run solver and compare with ground truth
python run.py --mode grid --input data/grid_mode/test.parquet --output grid_results.json

# Compare results with ground truth in detail
python compare_results.py

The validator automatically handles both output formats:

  • Old format: {"House1": {"Color": "red", ...}, "House2": {...}}
  • New format: {"header": ["House", "Color", ...], "rows": [["1", "red", ...], ...]}

Both are normalized to lowercase for case-insensitive comparison.

πŸ”¬ Algorithm Pseudocode

def backtracking_search(csp):
    if assignment is complete:
        return assignment
    
    var = select_unassigned_variable(csp)  # MRV heuristic
    
    for value in order_domain_values(var, csp):  # LCV heuristic
        if is_consistent(var, value, assignment, csp):
            assign(var, value)
            inferences = forward_check(var, value, csp)
            
            if inferences != failure:
                result = backtracking_search(csp)
                if result != failure:
                    return result
            
            remove_assignment(var, value)
            restore_domains(inferences)
    
    return failure

🧩 Example Puzzle

Input Clues:

  1. There are 3 houses
  2. The Englishman lives in the red house
  3. The Swede has a dog
  4. The Dane drinks tea
  5. The green house is to the left of the white house ...

CSP Representation:

  • Variables: House1, House2, House3
  • Domains: {(English, Red, Dog, Tea), (Swedish, Green, Cat, Coffee), ...}
  • Constraints: Englishman.color = Red, Swede.pet = Dog, ...

🀝 Team Collaboration

  • Team Size: 6-8 members from multiple countries
  • Duration: 11 days
  • Communication: Establish clear ways of working
  • Deliverable: 2-minute video showcasing solution

οΏ½ Troubleshooting

Validation Failures

If you see mismatches between solver output and ground truth:

  1. Check Format Consistency: Ensure format_grid_solution() generates header+rows format
  2. Verify Normalization: All values should be lowercase and trimmed
  3. Attribute Ordering: Attributes should be sorted alphabetically in headers
  4. Missing Values: Empty attributes should be empty strings, not None

Common Issues

  • "Our solution missing header/rows format": The solver didn't format the solution correctly
  • "Mismatches: House1.color: expected 'red', got 'blue'": Constraint logic error
  • Low accuracy: Check constraint parsing and CSP variable assignments

Debugging

# Run with trace enabled to see search steps
python run.py --mode grid --input data/grid_mode/test.parquet --trace --trace-file trace.json

# Limit puzzles for quick testing
python run.py --mode grid --input data/grid_mode/test.parquet --limit 10

οΏ½πŸ“š References

  • Russell, S., & Norvig, P. (2020). Artificial Intelligence: A Modern Approach (4th ed.)
  • Mackworth, A. K. (1977). Consistency in networks of relations
  • Haralick, R. M., & Elliott, G. L. (1980). Increasing tree search efficiency for constraint satisfaction problems
  • ZebraLogicBench: https://huggingface.co/datasets/allenai/ZebraLogicBench

🎯 Improving Accuracy

Current performance: 63% grid accuracy, 29% MC accuracy

Main issues identified (run python analyze_errors.py):

  1. Attribute name mismatches (70% of errors):

    • Ground truth uses: Name, Animal, BookGenre
    • Solver produces: Person, Pet, Book
    • Fix: Add attribute name mapping in format_grid_solution()
  2. Value swaps (values correct but wrong houses):

    • Indicates weak positional constraints
    • Fix: Strengthen position-based constraints in clue_parser.py
  3. Unsolved puzzles (23/1000 grid, 97/3259 MC):

    • Timeout or constraint conflicts
    • Fix: Increase max_backtracks and timeout in run.py

Quick wins to improve accuracy:

  • Fix attribute name normalization
  • Parse more clue patterns
  • Add stronger positional constraints
  • Increase solver limits

πŸ“„ License

This project is submitted as part of the AI Connect 2025 educational challenge.

πŸ‘₯ Authors

[Team Name - Add your team members here]


Good luck and happy solving! πŸ§ πŸ”

About

py solver for puzzles from ZebraLogicBench dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors