Skip to content

KEINOS/go-microgpt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

go-microgpt

go-microgpt is a Go port of Andrej Karpathy's microgpt — a minimal GPT implementation to learn transformer internals.

Pure Go, no external dependencies, single-file implementation.

Built for learning—a faithful 1:1 port to understand GPT internals. As the original implementation says, this project is not optimized for efficiency.

What this project covers:

  • Automatic differentiation (backpropagation through a computation graph)
  • Multi-head attention and transformer blocks
  • Adam optimizer with learning rate scheduling
  • Training and inference loops for sequence models

Original Implementation

Quick Start

  • Requirements: Go 1.22+

  • Run directly:

    % # Local run
    % go run ./microgpt
    % # Docker run
    % docker run --rm -v "$(pwd)":/test -w /test golang:1.22-alpine go run ./microgpt.go
  • Build and run:

    % go build -o microgpt ./microgpt
    % ./microgpt
  • Run tests:

    % # Local run
    % go test ./microgpt -v -race
    % # Docker run
    % docker run --rm -v "$(pwd)":/test -w /test golang:1.22-alpine go test -v ./...

Configure

Edit constants in microgpt/microgpt.go:

const (
    nLayer    = 1       // transformer layers (depth)
    nEmbd     = 16      // embedding size (width)
    blockSize = 16      // max sequence length per forward pass
    nHead     = 4       // attention heads (must divide nEmbd)
    numSteps  = 1000    // training iterations
    learningRate = 0.01 // Adam learning rate (0.01 recommended)
)
  • Default: ~3,400 parameters.

How each affects training:

Parameter Increase Effect
nLayer More layers Larger model, slower training
nEmbd Bigger size More expressive, higher memory
nHead More heads Better attention patterns, slower
blockSize Longer context Model sees more history
numSteps More iterations Lower loss, longer training
learningRate Higher value Faster convergence, risks instability

See Karpathy's blog for detailed explanations.

Dataset

Character-level names dataset from makemore. Auto-downloaded on first run.

Components

Included:

  • Autograd system with manual backpropagation
  • Multi-head attention, RMSNorm, feed-forward blocks
  • Adam optimizer with bias correction
  • Autoregressive sampling with temperature scaling
  • Character-level tokenization

Not included (by design):

  • Batching
  • Dropout/regularization
  • Bias vectors
  • Causal masking

Speed

This section is for reference only.

Even though this Go port runs ~9× faster than Python (due to compiled vs interpreted execution), we focus on faithfully reproducing the original code for learning, not optimizing performance.

% hyperfine "python3 ./ref/microgpt.py" "./microgpt"
Benchmark 1: python3 ./ref/microgpt.py
  Time (mean ± σ):     54.546 s ±  0.931 s

Benchmark 2: ./microgpt
  Time (mean ± σ):      5.928 s ±  0.165 s

Summary: ./microgpt runs 9.20× faster

References

License

About

A 1:1 Go port of Andrej Karpathy's microgpt. Pure Go, single file, no external dependencies.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Contributors