Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions records/track_10min_16mb/2026-03-26_BackoffNgramMixer/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# 11L BackoffNgramMixer

## Results

| Seed | val_bpb | Eval time |
|------|---------|-----------|
| 42 | 0.6672 | 512s |
| 1337 | 0.6673 | ~512s |
| 2024 | 0.6667 | ~512s |
| **Mean** | **0.6671** | |
| Std | 0.0003 | |

Artifact: ~16.0 MB. Train: 600s on 8xH100 SXM. Eval: ~512s.

## Architecture

- 11 layers, 512 dim, 8/8 full MHA heads
- XSA-all, LeakyReLU(0.5)^2, 3.5x MLP
- BigramHash, SmearGate, Value Residual, Gated Attention
- int5 quantization + zstd compression
- EMA, Tight SWA, Soft-Round QAT

## BackoffNgramMixer

GPU-vectorized multi-order n-gram backoff (orders 2-7) with entropy-adaptive alpha mixing. Score-first backward-looking cache with per-token entropy gating.

## Acknowledgments

Architecture and mixer based on community techniques.
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"name": "11L BackoffNgramMixer",
"val_loss": 0.6671,
"bytes_total": 15995959,
"blurb": "11L XSA-all 8/8 MHA with BackoffNgramMixer (entropy-adaptive, orders 2-7). Mean 0.6671 (std 0.0003) across 3 seeds.",
"author": "hypery11",
"github_id": "hypery11",
"date": "2026-03-26"
}
Loading