🦁 Zoof (v1.2)

A clean, optimized, and interpretable implementation of a decoder-only Transformer in PyTorch.

Zoof is a high-efficiency Small Language Model (SLM) engineered from scratch. It demonstrates how modern architectural choices and high-quality data can yield competitive performance in the sub-400M parameter regime, even with limited compute.

⚡ Key Features

Pre-Norm Architecture: Applies RMSNorm before self-attention and MLP blocks for better gradient flow and training stability.
Rotary Positional Embeddings (RoPE): Replaces absolute learned positional embeddings from v1 with RoPE, enabling better generalization to longer contexts.
Flash Attention: Automatically uses PyTorch's F.scaled_dot_product_attention, leveraging Flash Attention kernels when available for efficient $O(N^2)$ computing.
Smart Initialization: Implements a specific weight initialization strategy (scaling projections by $1/\sqrt{2L}$) to stabilize variance in deep residual paths.
Extensive Pre-training: Trained on approximately 79 Billion tokens from the FineWeb-Edu dataset, focusing on reasoning-dense content.

☁️ Quick Start (Google Colab)

You can prompt the Zoof model using Google Colab's free T4 GPUs. This is the fastest way to try the model without installing anything locally.

Click here to open the Interactive Notebook

The notebook handles:

Cloning the repository.
Installing dependencies (torch, transformers).
Loading the model on the GPU (cuda).
Running the interactive chat loop.

🛠️ Local Installation

This project uses uv for fast package management, but standard pip works as well.

Prerequisites

Python 3.8+
PyTorch (CUDA required for Flash Attention)

Setup

git clone https://github.com/yourusername/zoof.git
cd zoof

uv sync

🎮 Usage: CLI Chat

I've provided a script to chat with a pre-trained & fine-tuned version of the model (zoof-v1.2-394M-chat) hosted on Hugging Face.

Run the following to prompt the model:

python prompt_zoof.py

This script will:

Download the config and model weights from Jiraya/zoof-250M-chat.
Download the tokenizer from Jiraya/zoof-tokenizer.
Launch an interactive session.

📊 Performance & Benchmarks

Despite being trained on significantly less data than industry baselines, zoof-v1.2-394M demonstrates competitive performance, particularly in tasks requiring boolean logic and physical commonsense.

Benchmark	Metric	Zoof-v1.2-394M	SmolLM-360M	SmolLM2-360M	Qwen2.5-0.5B
Training Tokens	Data Efficiency	79B	600B	4T	18T
PIQA	Physical Commonsense	69.5	71.6	71.7	69.9
BoolQ	Boolean Reasoning	59.9	-	-	-
WinoGrande	Pronoun Resolution	53.8	52.8	52.5	54.1
HellaSwag	Commonsense NLI	47.0	51.8	54.5	51.2
OBQA	OpenBookQA	37.2	37.2	37.4	37.4
ARC-E	Science (Easy)	44.3	-	-	-
ARC-C	Science (Challenge)	32.3	-	-	35.6
SIQA	Social Commonsense	40.3	-	-	-
MMLU (cloze)	General Knowledge	28.5	34.4	35.8	33.7
MMLU	General Knowledge	29.6	-	-	-
RACE	Reading Comprehension	38.3	-	-	-

Note: Zoof achieves these scores with ~2% of the training compute used for SmolLM2 (79B vs 4T tokens), highlighting the efficiency of the architecture and FineWeb-Edu dataset.

Directory Structure

│
├── src/
│   ├── zoof_v1
│   │   └── model.py    # Model definition for v1
│   ├── zoof_v1_2
│   │   └── model.py    # Model definition for v1.2
│   ├── config.py       # Configuration dataclass
│   ├── prompt_zoof.py  # Interactive CLI chat script
│   └── utils.py        # Helper utilities
├── .gitignore
├── .pre-commit-config.yaml
├── pyproject.toml
└── uv.lock             # Dependency lock file

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
notebooks		notebooks
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🦁 Zoof (v1.2)

⚡ Key Features

☁️ Quick Start (Google Colab)

🛠️ Local Installation

Prerequisites

Setup

🎮 Usage: CLI Chat

📊 Performance & Benchmarks

Directory Structure

Model Weights & Tokenizer

About

Uh oh!

Releases

Packages

Languages

License

pradyGn/zoof

Folders and files

Latest commit

History

Repository files navigation

🦁 Zoof (v1.2)

⚡ Key Features

☁️ Quick Start (Google Colab)

🛠️ Local Installation

Prerequisites

Setup

🎮 Usage: CLI Chat

📊 Performance & Benchmarks

Directory Structure

Model Weights & Tokenizer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages