Skip to content

lunolunoluno/chess-explainer

Repository files navigation

Chess Explainer

Project Structure

├── Notebooks/      # Juypter Notebooks with alternative codes to filter and reformulate data
├── data/           
    ├──comments/    # All comments extracted from games
    ├──evaluations/ # Results of model evaluation
    ├──raw/         # All raw .pgn files to create the dataset
    ├──*.py         # Utility scripts used to manipulate the dataset
    └──*.csv        # Partial or complete datasets
├── models/         # Weights of trained models
├── modules/        # Modules of the project
├── controller.py   # Contains the main functions of the project
└── main.py         # Entry point of the code

Requirements

This code was made with python version 3.10. I haven't tested it with any other versions.

This code was made to run on a computer with CUDA version >= 12.8. It might work with other versions of CUDA but I haven't tested it and it requires to install the right version of PyTorch. The code should also work on a computer with no GPU although it is not recommended.

Install requirements

The list of the required libraries is in the requirements.txt file and can be installed with :

pip install -r ./requirements.txt

Set up .env file

You can copy the .env.example file to create your .env and fill it with the proper values.

ENGINE_PATH="/path/to/chess/engine"

DATA_PATH="./data"
DATA_RAW_PATH="./data/raw"
DATA_ANALYZED_PATH="./data/analyzed"
DATA_COMMENTS_PATH="./data/comments"
DATA_EVALUATIONS_PATH="./data/evaluations"
MODEL_PATH="./models"

HUGGING_FACE_TOKEN="TOKEN HERE"

Chess engine

You can download Stockfish here and set ENGINE_PATH to the path were you install it.

Usage

The script can be run using the following command:

python main.py

This will create a dataset using Gemma3 1B it to filter are reformulate comments from the ./data/raw folder. Gemma3 1B it will then be trained on the resulting dataset.

Parameters

This script can be modified with different parameters.

  • --debug

    Default: True

    Show debug informations.

  • --llm

    Default: "google/gemma-3-1b-it"

    LLM used as base model. This is the model that will be trained on the dataset.

  • --dataset

    Default: None

    The path to the csv file used for training/evaluation. If None then the script will create a dataset from content in ./data/raw/. The csv file should contain at least the following columns: "moves", "engine_eval", "engine_best_line", "engine_best_alternative" and "reformulated". However if you want the model to use different column in its prompts or as a target, you can modify the parameters input_columns and input_target in main.py.

    You can use an already made dataset with the following command:

    python main.py --dataset "./Notebooks/reformulated_data_20250902_163137.csv" 
  • --trained_model

    Default: None

    Path to a trained model. If None, will train a model with parameters given. By default, trained models are saved in ./models.

  • --llm_filter

    Default: "google/gemma-3-1b-it"

    LLM used to filter/reformulate comments.

  • --evaluate_model

    Default: True

    Evaluate the trained model with 20% of the dataset.

  • --evaluate_base_model

    Default: True

    Evaluate the base model with 20% of the dataset.

  • --save_evaluation

    Default: True

    Will save the graphs and answers of the evaluation in a folder in ./data/evaluations/.

  • --show_evaluation

    Default: False

    Will show the graphs of the evaluation.

  • --prompt

    Default: None

    Will answer to the prompt using the trained model.

About

This project is part of my thesis "Enhancing Chess Analysis: AI‐Driven Explanations for Game Evaluations" for the degree of Master of Science in Artificial Intelligence at L-Università ta' Malta.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors