Boardgames Are Made Of Rules

Project Overview

This project evaluates how well different large language models (LLMs) process, analyze, and extract information from board-game rule texts. By testing models on real board games from the BoardGameGeek Hall of Fame, we assess their ability to interpret formal logic and adapt complex information to different contexts.

Objectives

The primary objectives of this project are to:

Evaluate explanation generation: Test LLMs' ability to extract fundamental game concepts and adapt them for different audiences (ages 7, 11, and 16)
Assess error detection: Determine how well models can identify missing or contradictory information in rule texts
Estimate game properties: Compare model predictions of game complexity, optimal player count, mechanics, and duration against BoardGameGeek data

Models Tested

This study compares three open-source LLMs:

LLaMA
Gemma
Qwen

Dataset

Rules from 5 board games (subset of BoardGameGeek Hall of Fame):

7 Wonders
Catan
Dominion
Power Grid
Ticket to Ride

Methodology

Explanation Task

Models generate age-appropriate explanations evaluated on readability (SMOG, Flesch-Kincaid, Dale-Chall), completeness (rule coverage), and conciseness (compression ratio).

Error Detection Task

Models identify intentional flaws in rule texts across 5 difficulty levels: original, missing rules, contradictions, incoherent combinations, and game-breaking mechanics.

Parameter Estimation Task

Models estimate game mechanics, complexity, optimal player count, and duration, with results validated against BoardGameGeek data.

Key Findings

Models show strong performance in rule extraction and player count estimation
Error detection capabilities are limited, especially for subtle inconsistencies
Smaller model size is a significant constraint for complex logical reasoning
Duration estimation remains unreliable across all tested models

AI usage disclaimer

Parts of this project have been developed with the assistance of OpenAI's ChatGPT (GPT-oss). AI was used to summarize ideas, generate code for some visualization (marked with a comment), rephrasing, help restructure data for the report tables and generate this README. All content produced with AI assistance has been carefully reviewed, edited, and validated by me. I take full responsibility for the final content and its accuracy.

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
error_detection		error_detection
estimation		estimation
evaluation		evaluation
explain		explain
extraction		extraction
img		img
prompts		prompts
rules		rules
.gitignore		.gitignore
README.md		README.md
error_detection.ipynb		error_detection.ipynb
error_detection_remote.ipynb		error_detection_remote.ipynb
estimation.ipynb		estimation.ipynb
evaluation.ipynb		evaluation.ipynb
explain.ipynb		explain.ipynb
extraction_local.ipynb		extraction_local.ipynb
extraction_remote.ipynb		extraction_remote.ipynb
main.ipynb		main.ipynb
presentation_NLP.pdf		presentation_NLP.pdf
report.pdf		report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Boardgames Are Made Of Rules

Project Overview

Objectives

Models Tested

Dataset

Methodology

Explanation Task

Error Detection Task

Parameter Estimation Task

Key Findings

AI usage disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Boardgames Are Made Of Rules

Project Overview

Objectives

Models Tested

Dataset

Methodology

Explanation Task

Error Detection Task

Parameter Estimation Task

Key Findings

AI usage disclaimer

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages