go-microgpt

go-microgpt is a Go port of Andrej Karpathy's microgpt — a minimal GPT implementation to learn transformer internals.

Pure Go, no external dependencies, single-file implementation.

Built for learning—a faithful 1:1 port to understand GPT internals. As the original implementation says, this project is not optimized for efficiency.

What this project covers:

Automatic differentiation (backpropagation through a computation graph)
Multi-head attention and transformer blocks
Adam optimizer with learning rate scheduling
Training and inference loops for sequence models

Original Implementation

Python: gist.github.com/.../microgpt.py [Rev. 14fb038]
Blog: karpathy.github.io/2026/02/12/microgpt/

Quick Start

Requirements: Go 1.22+

Run directly:

% # Local run
% go run ./microgpt

% # Docker run
% docker run --rm -v "$(pwd)":/test -w /test golang:1.22-alpine go run ./microgpt.go

Build and run:

% go build -o microgpt ./microgpt
% ./microgpt

Run tests:

% # Local run
% go test ./microgpt -v -race

% # Docker run
% docker run --rm -v "$(pwd)":/test -w /test golang:1.22-alpine go test -v ./...

Configure

Edit constants in microgpt/microgpt.go:

const (
    nLayer    = 1       // transformer layers (depth)
    nEmbd     = 16      // embedding size (width)
    blockSize = 16      // max sequence length per forward pass
    nHead     = 4       // attention heads (must divide nEmbd)
    numSteps  = 1000    // training iterations
    learningRate = 0.01 // Adam learning rate (0.01 recommended)
)

Default: ~3,400 parameters.

How each affects training:

Parameter	Increase	Effect
`nLayer`	More layers	Larger model, slower training
`nEmbd`	Bigger size	More expressive, higher memory
`nHead`	More heads	Better attention patterns, slower
`blockSize`	Longer context	Model sees more history
`numSteps`	More iterations	Lower loss, longer training
`learningRate`	Higher value	Faster convergence, risks instability

See Karpathy's blog for detailed explanations.

Dataset

Character-level names dataset from makemore. Auto-downloaded on first run.

Components

Included:

Autograd system with manual backpropagation
Multi-head attention, RMSNorm, feed-forward blocks
Adam optimizer with bias correction
Autoregressive sampling with temperature scaling
Character-level tokenization

Not included (by design):

Batching
Dropout/regularization
Bias vectors
Causal masking

Speed

This section is for reference only.

Even though this Go port runs ~9× faster than Python (due to compiled vs interpreted execution), we focus on faithfully reproducing the original code for learning, not optimizing performance.

% hyperfine "python3 ./ref/microgpt.py" "./microgpt"
Benchmark 1: python3 ./ref/microgpt.py
  Time (mean ± σ):     54.546 s ±  0.931 s

Benchmark 2: ./microgpt
  Time (mean ± σ):      5.928 s ±  0.165 s

Summary: ./microgpt runs 9.20× faster

References

たった200行のPythonコードでGPTの学習と推論を動かす【microgpt by A. Karpathy】 | 数理の弾丸⚡️京大博士のAI解説 @ Youtube (in Japanese)

License

MIT License
Authors:
- Andrej Karpathy (original Python implementation)
- KEINOS and the contributors (Go port)

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
ref		ref
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.markdownlint.jsonc		.markdownlint.jsonc
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
microgpt.go		microgpt.go
microgpt_test.go		microgpt_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

go-microgpt

Original Implementation

Quick Start

Configure

Dataset

Components

Speed

References

License

About

Uh oh!

Releases

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

go-microgpt

Original Implementation

Quick Start

Configure

Dataset

Components

Speed

References

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors

Uh oh!

Languages