GoLLM is an experimental project to explore Large Language Models (LLMs) for Go (Weiqi/Baduk), inspired by the paper The Go Transformer: Natural Language Modeling for Game Play.
The goal is to train a LLaMA-based model on large-scale SGF game records and investigate its ability to generate legal moves and predict next moves in Go games.
We use the open-source dataset from yenw/computer-go-dataset, which contains:
- AI self-play games (thousands of high-quality AI vs AI SGFs)
- Professional human games (
pro2000+and others)
- Remove non-ASCII characters,
;,[,] - Keep only coordinate moves
- Replace empty
[]or[tt]withZZ - Remove extra spaces between moves
- Output compact text format (one game per line)
- Group into files (10,000 games per file)
- Model: LLaMA 3 (7B)
- Fine-tuning: QLoRA (4-bit quantization with LoRA adapters)
- Frameworks:
- Tokenizer: llama.cpp (GGUF models via Ollama)
- Hardware: Single RTX 4070 ti (16GB VRAM)
python scripts/clean_sgf.py \
--input ~/Code/sgf_data/AI \
--output data/clean/AIFor professional games (e.g., pro2000+):
python scripts/clean_pro.py \
--input ~/Code/sgf_data/Professional/pro2000+.txt \
--output data/clean/proPack cleaned games into tokenized sequences (train/val/test split = 80/10/10, context length 512):
python scripts/preprocess_go_lm.py \
--backend llama_cpp \
--gguf-path ~/.ollama_models/sha256-667b0c1932bc6ffc593ed1d03f895bf2dc8dc6df21db3042284a6f4416b06a29 \
--input-dir data/clean/AI \
--glob "train_*.txt" \
--context-length 512 \
--output-dir data/packed/trainValidation set example:
python scripts/llama31_token_pack_ollama.py \
--backend llama_cpp \
--gguf-path ~/.ollama_models/sha256-667b0c1932bc6ffc593ed1d03f895bf2dc8dc6df21db3042284a6f4416b06a29 \
--input-dir data/clean/pro \
--glob "val_*.txt" \
--context-length 512 \
--output-dir data/packed/valRun QLoRA fine-tuning (train_qlora.py):
torchrun --nproc_per_node=1 scripts/train_qlora.py \
--model_name_or_path meta-llama/Meta-Llama-3-8B \
--dataset data/packed/train \
--validation_data data/packed/val \
--output_dir artifacts/lora_adapter \
--num_train_epochs 3 \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 16 \
--learning_rate 2e-5 \
--lr_scheduler_type cosine \
--logging_steps 10 \
--save_strategy epoch-
Model: LLaMA 3 7B, QLoRA (RTX 4070, 12GB)
-
Dataset: ~50k games (~10M tokens)
-
Epochs: 3
-
Learning rate: 2e-5 → cosine decay
-
Train loss: ~4.36
-
Eval loss: ~4.16
This shows the model successfully learns to predict legal Go moves, though playing strength is still limited.
-
Train with 8B model and more datasets for stronger results
-
Add self-play data augmentation
-
Evaluate move prediction accuracy (top-k metrics)
llama.cpp (GGUF models via Ollama)