go-microgpt is a Go port of Andrej Karpathy's microgpt — a minimal GPT implementation to learn transformer internals.
Pure Go, no external dependencies, single-file implementation.
Built for learning—a faithful 1:1 port to understand GPT internals. As the original implementation says, this project is not optimized for efficiency.
What this project covers:
- Automatic differentiation (backpropagation through a computation graph)
- Multi-head attention and transformer blocks
- Adam optimizer with learning rate scheduling
- Training and inference loops for sequence models
- Python: gist.github.com/.../microgpt.py [Rev. 14fb038]
- Blog: karpathy.github.io/2026/02/12/microgpt/
-
Requirements: Go 1.22+
-
Run directly:
% # Local run % go run ./microgpt
% # Docker run % docker run --rm -v "$(pwd)":/test -w /test golang:1.22-alpine go run ./microgpt.go
-
Build and run:
% go build -o microgpt ./microgpt % ./microgpt
-
Run tests:
% # Local run % go test ./microgpt -v -race
% # Docker run % docker run --rm -v "$(pwd)":/test -w /test golang:1.22-alpine go test -v ./...
Edit constants in microgpt/microgpt.go:
const (
nLayer = 1 // transformer layers (depth)
nEmbd = 16 // embedding size (width)
blockSize = 16 // max sequence length per forward pass
nHead = 4 // attention heads (must divide nEmbd)
numSteps = 1000 // training iterations
learningRate = 0.01 // Adam learning rate (0.01 recommended)
)- Default: ~3,400 parameters.
How each affects training:
| Parameter | Increase | Effect |
|---|---|---|
nLayer |
More layers | Larger model, slower training |
nEmbd |
Bigger size | More expressive, higher memory |
nHead |
More heads | Better attention patterns, slower |
blockSize |
Longer context | Model sees more history |
numSteps |
More iterations | Lower loss, longer training |
learningRate |
Higher value | Faster convergence, risks instability |
See Karpathy's blog for detailed explanations.
Character-level names dataset from makemore. Auto-downloaded on first run.
Included:
- Autograd system with manual backpropagation
- Multi-head attention, RMSNorm, feed-forward blocks
- Adam optimizer with bias correction
- Autoregressive sampling with temperature scaling
- Character-level tokenization
Not included (by design):
- Batching
- Dropout/regularization
- Bias vectors
- Causal masking
This section is for reference only.
Even though this Go port runs ~9× faster than Python (due to compiled vs interpreted execution), we focus on faithfully reproducing the original code for learning, not optimizing performance.
% hyperfine "python3 ./ref/microgpt.py" "./microgpt"
Benchmark 1: python3 ./ref/microgpt.py
Time (mean ± σ): 54.546 s ± 0.931 s
Benchmark 2: ./microgpt
Time (mean ± σ): 5.928 s ± 0.165 s
Summary: ./microgpt runs 9.20× faster- たった200行のPythonコードでGPTの学習と推論を動かす【microgpt by A. Karpathy】 | 数理の弾丸⚡️京大博士のAI解説 @ Youtube (in Japanese)
- MIT License
- Authors:
- Andrej Karpathy (original Python implementation)
- KEINOS and the contributors (Go port)