GoLLM

GoLLM is an experimental project to explore Large Language Models (LLMs) for Go (Weiqi/Baduk), inspired by the paper The Go Transformer: Natural Language Modeling for Game Play.

The goal is to train a LLaMA-based model on large-scale SGF game records and investigate its ability to generate legal moves and predict next moves in Go games.

Dataset

We use the open-source dataset from yenw/computer-go-dataset, which contains:

AI self-play games (thousands of high-quality AI vs AI SGFs)
Professional human games (pro2000+ and others)

Cleaning & Preprocessing

Remove non-ASCII characters, ;, [, ]
Keep only coordinate moves
Replace empty [] or [tt] with ZZ
Remove extra spaces between moves
Output compact text format (one game per line)
Group into files (10,000 games per file)

Tech Stack

Model: LLaMA 3 (7B)
Fine-tuning: QLoRA (4-bit quantization with LoRA adapters)
Frameworks:
Tokenizer: llama.cpp (GGUF models via Ollama)
Hardware: Single RTX 4070 ti (16GB VRAM)

Usage

1. Clean SGF files

python scripts/clean_sgf.py \
  --input ~/Code/sgf_data/AI \
  --output data/clean/AI

For professional games (e.g., pro2000+):

python scripts/clean_pro.py \
  --input ~/Code/sgf_data/Professional/pro2000+.txt \
  --output data/clean/pro

2. Tokenization & Sharding

Pack cleaned games into tokenized sequences (train/val/test split = 80/10/10, context length 512):

python scripts/preprocess_go_lm.py \
  --backend llama_cpp \
  --gguf-path ~/.ollama_models/sha256-667b0c1932bc6ffc593ed1d03f895bf2dc8dc6df21db3042284a6f4416b06a29 \
  --input-dir data/clean/AI \
  --glob "train_*.txt" \
  --context-length 512 \
  --output-dir data/packed/train

Validation set example:

python scripts/llama31_token_pack_ollama.py \
  --backend llama_cpp \
  --gguf-path ~/.ollama_models/sha256-667b0c1932bc6ffc593ed1d03f895bf2dc8dc6df21db3042284a6f4416b06a29 \
  --input-dir data/clean/pro \
  --glob "val_*.txt" \
  --context-length 512 \
  --output-dir data/packed/val

3. Training with QLoRA

Run QLoRA fine-tuning (train_qlora.py):

torchrun --nproc_per_node=1 scripts/train_qlora.py \
  --model_name_or_path meta-llama/Meta-Llama-3-8B \
  --dataset data/packed/train \
  --validation_data data/packed/val \
  --output_dir artifacts/lora_adapter \
  --num_train_epochs 3 \
  --per_device_train_batch_size 1 \
  --gradient_accumulation_steps 16 \
  --learning_rate 2e-5 \
  --lr_scheduler_type cosine \
  --logging_steps 10 \
  --save_strategy epoch

Training Results (v0.1)

Model: LLaMA 3 7B, QLoRA (RTX 4070, 12GB)
Dataset: ~50k games (~10M tokens)
Epochs: 3
Learning rate: 2e-5 → cosine decay
Train loss: ~4.36
Eval loss: ~4.16

This shows the model successfully learns to predict legal Go moves, though playing strength is still limited.

Next Steps

Train with 8B model and more datasets for stronger results
Add self-play data augmentation
Evaluate move prediction accuracy (top-k metrics)

Acknowledgments

yenw/computer-go-dataset

HuggingFace

llama.cpp (GGUF models via Ollama)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
scripts		scripts
sgf_clean_tools		sgf_clean_tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
go-llm.code-workspace		go-llm.code-workspace
gtp_player.py		gtp_player.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GoLLM

Dataset

Cleaning & Preprocessing

Tech Stack

Usage

1. Clean SGF files

2. Tokenization & Sharding

3. Training with QLoRA

Training Results (v0.1)

Next Steps

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

foxkillerli/GoLLM

Folders and files

Latest commit

History

Repository files navigation

GoLLM

Dataset

Cleaning & Preprocessing

Tech Stack

Usage

1. Clean SGF files

2. Tokenization & Sharding

3. Training with QLoRA

Training Results (v0.1)

Next Steps

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages