Skip to content

Auxeno/fable

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“– Fable

A compact storytelling language model

Python JAX Flax NNX License Open In Colab



🧭 Overview

Fable is a compact storytelling language model implemented from scratch using JAX and Flax NNX.
It provides an end-to-end training pipeline β€” from text preparation and tokenisation to model training and autoregressive text generation.

Fable is designed to be small and fast β€” a fluent model can be trained on consumer GPUs in under an hour.
The included demo checkpoint was trained for ~2 hours on an RTX 4090 using the default configuration.


✨ Features

  • 🧠 Minimal GPT Architecture β€” Lightweight decoder-only transformer (~800 k parameters).
  • 🧡 Simple Data Pipeline β€” Deterministic download β†’ clean β†’ tokenise workflow for small text datasets.
  • πŸš€ JIT Compilation β€” Core steps compiled with jax.jit, achieving ~3 million tokens/sec throughput.
  • πŸ’Ύ Checkpointing β€” Save and restore model state and hyperparameters in a single folder.
  • πŸ“ Text Generation Tools β€” Generate short stories with adjustable sampling temperature.

🎬 Demo

Open In Colab

The demo notebook walks through:

  • Loading the pretrained demo checkpoint.
  • Generating short stories from prompts.
  • Exploring different temperature values.
  • Optionally running the data pipeline and a brief training step.

The notebook runs entirely in a hosted Colab environment β€” no local setup required.


πŸ“¦ Installation

To install locally:

# Install latest release
pip install git+https://github.com/auxeno/fable

# Local development setup
git clone https://github.com/auxeno/fable.git
cd fable
pip install -e .

For GPU acceleration, install the JAX wheel matching your CUDA version, for example:

pip install --upgrade "jax[cuda13]"

Refer to the JAX installation guide for up-to-date instructions.


🧾 Text Generation

To generate a story, simply write the first few words/lines and Fable will continue.

Python API

from fable import generate_text

generate_text("Lily got a new puppy")  # Uses checkpoints/demo by default
# To sample your own run: generate_text("...", checkpoint="model_state")

Command Line Interface

fable-generate --start "Lily got a new puppy" --temperature 0.6
# Use --checkpoint model_state to load your own training run

Temperature Examples

Sampling temperature controls the balance between determinism and creativity:

  • Low (β‰ˆ0.4): Predictable and faithful to training data.
  • Medium (β‰ˆ0.6): More varied but occasionally incoherent.
  • High (β‰ˆ0.8): Grammatically fluent but semantically unstable.

🟦 Temperature 0.4 β€” Stable and consistent

generate_text("Lily got a new puppy", temperature=0.4)

"""
Lily got a new puppy and said, "I want to see it!" Her mom smiled and said, 
"Okay, Lily. Let's go to the puppy and see if we can play with it." 
Lily smiled and said, "Okay."
Lily felt better.
...
"""

🟧 Temperature 0.6 β€” Balanced but sometimes incoherent

generate_text("Lily got a new puppy", temperature=0.6)

"""
Lily got a new puppy named Spot. He hit his ball out and fell on the floor. 
He cried and benly, but Spot was too fit fast. 
His friend cheered and then flew to the ball. 
They pulled and tugged and tugged. They ran away.
Spot gave the ball a kiss and the ball back to their mommy.
...
"""

πŸŸ₯ Temperature 0.8 β€” Fluent but nonsense

generate_text("Lily got a new puppy", temperature=0.8)

"""
Lily got a new puppy and stopped pretending,
because other automobiles walked through the fall.
Aftenma's mushy man purred a sleepy scene for a while, although Weggin circle. 
They barked and snuggled until Fridge was. 
Fred finally paddle of his bedroom were even blacker! 
Jeddy was so excited that he didn't want to give.
...
"""

πŸ—ƒοΈ Data Pipeline

Fable includes a small data-preparation utility that downloads and processes the TinyStories dataset for training.

# Download, clean, and tokenise text data
fable-prepare-data --stage all

# Or run an individual stage (download, clean, tokenise)
fable-prepare-data --stage clean

This creates:

  • data/raw/ β€” raw TinyStories dataset .txt files.
  • data/clean/ β€” cleaned text filtered to supported characters.
  • data/tokenized/ β€” int8 binary token buffers used for model training.

While TinyStories is used by default, the same pipeline can be adapted for other small-scale narrative datasets with minimal modification.


🧠 Training

Once data is prepared, train a model from scratch using either Python or the CLI.

Python API

from fable import save, train

model = train()
save(model)

Command-Line Interface

fable-train --num-epochs 5 --batch-size 128 --learning-rate 3e-4

Training progress and checkpoints are saved automatically in checkpoints/.


πŸ—οΈ Project Structure

fable/
β”œβ”€β”€ checkpoint.py              # Save/load wrappers for NNX state trees
β”œβ”€β”€ config.py                  # GPTConfig dataclass and defaults
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ pipeline.py            # Text data download/clean/tokenise commands
β”‚   β”œβ”€β”€ tokenize.py            # Character-level tokenizer and vocabulary tools
β”‚   └── tokenizer-config.json  # Vocabulary and end-of-text token definitions
β”œβ”€β”€ evaluate.py                # Validation step shared across training and notebooks
β”œβ”€β”€ generate.py                # Text generation helpers and CLI entry point
β”œβ”€β”€ model/
β”‚   β”œβ”€β”€ attention.py           # Multi-head self-attention
β”‚   β”œβ”€β”€ dropout.py             # Lightweight stochastic dropout layer
β”‚   β”œβ”€β”€ feed_forward.py        # GELU MLP block
β”‚   β”œβ”€β”€ gpt.py                 # GPT model assembly and forward pass
β”‚   β”œβ”€β”€ normalize.py           # Layer normalisation layer
β”‚   β”œβ”€β”€ position.py            # Sinusoidal positional embeddings
β”‚   └── transformer.py         # Pre-norm decoder block with dropout
β”œβ”€β”€ train.py                   # JIT-compiled training loop with Optax optimizers
└── utils.py                   # TQDM helper wrappers

checkpoints/
└── demo/                      # Example checkpoint trained for ~2 hours on RTX 4090

πŸ“š Citation

If Fable supports your research or teaching, please cite:

@software{fable2025,
  title = {Fable: A Compact Storytelling Language Model in JAX},
  author = {Alex Goddard},
  year = {2025},
  url = {https://github.com/auxeno/fable}
}

πŸ“œ License

Released under the MIT License.
See licence for the full text.


🌟 Acknowledgments

Thanks to the creators of the TinyStories dataset (Eldan & Li),
and to the JAX and Flax contributors whose work made Fable possible.