Project Overview

This is a PyTorch implementation of byte-level language modeling using the bltzr_tokenizer.py tokenizer.

The model is BLH, Byte Latent Hyena: a BLT-style byte/patch hybrid model with a latent Hyena backbone. It combines:

Local byte-rate processing via small Hyena blocks for encoding/decoding
Patch-rate latent backbone using Hyena operators for efficient long-range modeling
FFT-based causal convolution for sub-quadratic sequence modeling

The reference Hyena and BLT papers are included in the repo.

Core Architecture: The Hyena Operator

The key building block is HyenaOperator, which replaces self-attention with long convolutions. In this implementation:

Long convolutions: The implicit filter length is l_max (the context window).
Data-controlled filters: A small MLP generates the filter from a learnable positional signal.
Causal convolution via FFT: Causal linear convolution is implemented with zero-padded FFTs:

y[t] = Σ h[t-s] * x[s]  for s ≤ t

Complexity: O(L log L) time, O(L) memory for the convolution.

See ARCHITECTURE for detailed overview of all building blocks.

Setup

Install dependencies (CPU-only or CUDA builds of PyTorch both work):

pip install -r requirements.txt

Configuration

See CONFIG for all configuration parameters.

Also take a look at entropy model config and training config with entropy model for complete examples.

Dataset structure

See DATASET.md for detailed dataset schema format.

Training

Training uses YAML config files, sample config included.

# Train BLH with the smoke config
python train.py --config configs/blh_smoke.yaml

# Train using a custom file or directory (used for both training and validation then)
python train.py --config configs/blh_smoke.yaml --data_file /path/to/data_dir

# Resume training from a checkpoint (continues with same config/optimizer state)
python train.py --config /path/to/training_config.yaml --resume /path/to/runs/<model_name>/checkpoints/latest

Entropy-Based Patching

The default patching uses fixed sizes per modality (text: 16 bytes, binary: 64 bytes). Following the original BLT paper, you can also use entropy-based dynamic patching which allocates more compute to complex/unpredictable regions.

Entropy caches are token-indexed to match the exact training token stream produced by StreamBytesDataset.

Workflow

# Step 1: Train a small entropy model (~50M params)
python train_entropy_model.py --config configs/emo_2048.yaml

# Step 2: Precompute entropy scores for your training data
# IMPORTANT: --seq_len must match your training config `data.seq_len`
# `--source` can be a single file or a directory.
python precompute_entropy.py --seq_len 2048 --source /data/train.txt \
    --entropy_model /data/runs/emo_2048/exports/final

python precompute_entropy.py --seq_len 2048 --source /data/val.txt \
    --entropy_model /data/runs/emo_2048/exports/final

# Note: entropy caches are token-indexed

# Step 3: Train BLH with entropy-based patching
python train.py --config configs/blh_92m_entropy_2048.yaml

Inference

Inference is relatively minimal:

loads a model/checkpoint directory
generates a fixed number of tokens, optionally stopping generation earlier on stop tokens
uses top-k, top-p, or min-p sampling
optionally streams output incrementally with --stream

By default it prints the generated bytes decoded as lossy UTF-8 (errors="ignore"). You can also write the raw generated bytes to a file with --out_bytes.

python inference.py --model runs/<model_name>/exports/final --prompt "The quick brown fox" --max_tokens 100 --temperature 1.0
python inference.py --model runs/<model_name>/exports/final --prompt "Hello" --max_tokens 256 --out_bytes generated.bin

# Stream output as it generates
python inference.py --model runs/<model_name>/exports/final --prompt "Hello" --max_tokens 256 --stream

# Use top-p (nucleus) sampling with p=0.9
python inference.py --model runs/<model_name>/exports/final --prompt "Hello" --sampling_method top_p --top_p 0.9

# Use min-p sampling with min_p=0.05
python inference.py --model runs/<model_name>/exports/final --prompt "Hello" --sampling_method min_p --min_p 0.05

# Stop on custom tokens (e.g., "<EOS>" and a specific token ID)
python inference.py --model runs/<model_name>/exports/final --prompt "Hello" --stop_tokens "<EOS>" "274"

inference.py supports true entropy patching if you provide an exported entropy model:

python inference.py --entropy_model runs/emo_2048/exports/final --model runs/<model_name>/exports/final ...

Checkpoints

Artifacts are saved as directories containing:

metadata.json (config/tokenizer + training/optimizer/EWC metadata)
weights.safetensors (model weights; plus EWC tensors for training checkpoints)
optimizer.pt (optimizer tensors)

Training checkpoints are saved under runs/<model_name>/checkpoints/ and the best checkpoints are exposed via symlinks:

best_val -> best validation checkpoint directory
best_train -> best training-loss checkpoint directory
latest -> most recently saved checkpoint directory

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.opencode		.opencode
configs		configs
hyena		hyena
references		references
tests		tests
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CONFIG.md		CONFIG.md
DATASET.md		DATASET.md
DEVELOPMENT.md		DEVELOPMENT.md
LICENSE		LICENSE
README.md		README.md
bltzr_tokenizer.py		bltzr_tokenizer.py
dataset.py		dataset.py
inference.py		inference.py
precompute_entropy.py		precompute_entropy.py
requirements.txt		requirements.txt
train.py		train.py
train_entropy_model.py		train_entropy_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Overview

Core Architecture: The Hyena Operator

Setup

Configuration

Dataset structure

Training

Entropy-Based Patching

Workflow

Inference

Checkpoints

About

Uh oh!

Releases

Packages

Languages

License

epicfilemcnulty/blh

Folders and files

Latest commit

History

Repository files navigation

Project Overview

Core Architecture: The Hyena Operator

Setup

Configuration

Dataset structure

Training

Entropy-Based Patching

Workflow

Inference

Checkpoints

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages