Skip to content
/ zoof Public

Zoof is a robust PyTorch implementation of the Transformer decoder, optimized for pre-training Small Language Models (SLMs) on consumer hardware. It features a clean, modular codebase designed for ease of experimentation and rapid training.

License

Notifications You must be signed in to change notification settings

pradyGn/zoof

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🦁 Zoof (v1.2)

Zoof Badge License Params

A clean, optimized, and interpretable implementation of a decoder-only Transformer in PyTorch.

Run in Colab ⚡ | Hugging Face ☁️

Zoof is a high-efficiency Small Language Model (SLM) engineered from scratch. It demonstrates how modern architectural choices and high-quality data can yield competitive performance in the sub-400M parameter regime, even with limited compute.

⚡ Key Features

  • Pre-Norm Architecture: Applies RMSNorm before self-attention and MLP blocks for better gradient flow and training stability.
  • Rotary Positional Embeddings (RoPE): Replaces absolute learned positional embeddings from v1 with RoPE, enabling better generalization to longer contexts.
  • Flash Attention: Automatically uses PyTorch's F.scaled_dot_product_attention, leveraging Flash Attention kernels when available for efficient $O(N^2)$ computing.
  • Smart Initialization: Implements a specific weight initialization strategy (scaling projections by $1/\sqrt{2L}$) to stabilize variance in deep residual paths.
  • Extensive Pre-training: Trained on approximately 79 Billion tokens from the FineWeb-Edu dataset, focusing on reasoning-dense content.

☁️ Quick Start (Google Colab)

You can prompt the Zoof model using Google Colab's free T4 GPUs. This is the fastest way to try the model without installing anything locally.

Click here to open the Interactive Notebook

The notebook handles:

  • Cloning the repository.
  • Installing dependencies (torch, transformers).
  • Loading the model on the GPU (cuda).
  • Running the interactive chat loop.

🛠️ Local Installation

This project uses uv for fast package management, but standard pip works as well.

Prerequisites

  • Python 3.8+
  • PyTorch (CUDA required for Flash Attention)

Setup

git clone https://github.com/yourusername/zoof.git
cd zoof

uv sync

🎮 Usage: CLI Chat

I've provided a script to chat with a pre-trained & fine-tuned version of the model (zoof-v1.2-394M-chat) hosted on Hugging Face.

Run the following to prompt the model:

python prompt_zoof.py

This script will:

  • Download the config and model weights from Jiraya/zoof-250M-chat.
  • Download the tokenizer from Jiraya/zoof-tokenizer.
  • Launch an interactive session.

📊 Performance & Benchmarks

Despite being trained on significantly less data than industry baselines, zoof-v1.2-394M demonstrates competitive performance, particularly in tasks requiring boolean logic and physical commonsense.

Benchmark Metric Zoof-v1.2-394M SmolLM-360M SmolLM2-360M Qwen2.5-0.5B
Training Tokens Data Efficiency 79B 600B 4T 18T
PIQA Physical Commonsense 69.5 71.6 71.7 69.9
BoolQ Boolean Reasoning 59.9 - - -
WinoGrande Pronoun Resolution 53.8 52.8 52.5 54.1
HellaSwag Commonsense NLI 47.0 51.8 54.5 51.2
OBQA OpenBookQA 37.2 37.2 37.4 37.4
ARC-E Science (Easy) 44.3 - - -
ARC-C Science (Challenge) 32.3 - - 35.6
SIQA Social Commonsense 40.3 - - -
MMLU (cloze) General Knowledge 28.5 34.4 35.8 33.7
MMLU General Knowledge 29.6 - - -
RACE Reading Comprehension 38.3 - - -

Note: Zoof achieves these scores with ~2% of the training compute used for SmolLM2 (79B vs 4T tokens), highlighting the efficiency of the architecture and FineWeb-Edu dataset.

Directory Structure

│
├── src/
│   ├── zoof_v1
│   │   └── model.py    # Model definition for v1
│   ├── zoof_v1_2
│   │   └── model.py    # Model definition for v1.2
│   ├── config.py       # Configuration dataclass
│   ├── prompt_zoof.py  # Interactive CLI chat script
│   └── utils.py        # Helper utilities
├── .gitignore
├── .pre-commit-config.yaml
├── pyproject.toml
└── uv.lock             # Dependency lock file

Model Weights & Tokenizer

About

Zoof is a robust PyTorch implementation of the Transformer decoder, optimized for pre-training Small Language Models (SLMs) on consumer hardware. It features a clean, modular codebase designed for ease of experimentation and rapid training.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published