Skip to content

ratcht/llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 

Repository files navigation

LLM

Complete implementations of large language models including all sub-components.

Repository Structure

torch/
├── gpt/               # GPT-1 style implementation
└── llama/             # LLaMA-1/2 implementation

What's Implemented

GPT (torch/gpt/):

  • Multi-head self-attention with causal masking
  • Learned positional embeddings
  • LayerNorm, feedforward blocks
  • Training loop with loss estimation

LLaMA (torch/llama/):

  • Multi-head attention with Rotary Position Embeddings (RoPE)
  • RMSNorm (instead of LayerNorm)
  • SwiGLU feedforward network
  • Top-p sampling for generation
  • SentencePiece tokenizer

Usage

GPT:

cd torch/gpt
python train.py

LLaMA:

cd torch/llama
python generate.py

Default Configurations

Parameter GPT LLaMA
Embedding dim 384 4096
Hidden dim - 11008
Heads 6 32
Layers 6 32
Context length 256 2048
Dropout 0.2 0.0

References

About

LLM implementations (GPT, LLaMA)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages