openai · dentity007 · Mar 27, 2026 · Mar 27, 2026 · Mar 27, 2026
diff --git a/records/track_10min_16mb/2026-03-27_Dirichlet_Ngram_Phrase_Cache/README.md b/records/track_10min_16mb/2026-03-27_Dirichlet_Ngram_Phrase_Cache/README.md
@@ -0,0 +1,58 @@
+# Two-Level Dirichlet Posterior + Per-Order OBCL + Phrase Cache
+
+**val_bpb: 0.11556** (3-seed mean, std 0.0000057) | **~15.1 MB** | 8xH100 SXM
+
+## Results (8xH100 80GB SXM, Rancho Cordova CA)
+
+| Seed | Val BPB | Eval Time |
+|------|---------|-----------|
+| 1337 | 0.11555061 | 419s |
+| 42 | 0.11556435 | 370s |
+| 2025 | 0.11555875 | 359s |
+| **Mean** | **0.11556 (std 0.0000057)** | |
+
+## Architecture
+
+EBLS: 3 shared transformer blocks looped 3x + 2 unique = 11 effective layers.
+512d, 8 heads, 4 KV heads (GQA), MLP 3x with LeakyReLU(0.5)², per-virtual-layer LoRA rank 8.
+
+## Key Techniques
+
+- **Two-level Dirichlet smoothing** with per-order OBCL concentrations (50.0 for bigrams → 1.86 for 14-grams)
+- **Phrase suffix matching** at probe lengths [20, 16] with Dirichlet concentration 1.0
+- **15-gram backoff** (orders 2-15, 4M hash buckets)
+- **Complementary training** (alpha=0.50, orders 2-5)
+- **GPTQ int6 + LZMA** compression
+- **EMA 0.997 + SWA** weight averaging
+- **XSA** on all 11 layers
+
+## Credits
+
+Built on the shoulders of the community:
+- @signalrush (PR #414) — GPTQ + EMA + warmdown foundation
+- @Robby955 (PR #900) — Dirichlet smoothing, OBCL, phrase cache
+- @himanshudongre (PR #846) — two-pass rescoring concept
+- @deanbrr (PR #659) — original N-gram cache concept
+- @newjordan (PR #674) — first legal implementation
+- @pentxayc (PR #803) — complementary training
+
+## Run Command
+
+```bash
+DATA_PATH=./data/datasets/fineweb10B_sp1024/ \
+TOKENIZER_PATH=./data/tokenizers/fineweb_1024_bpe.model \
+VOCAB_SIZE=1024 MAX_WALLCLOCK_SECONDS=560 XSA_LAST_N=11 \
+WARMDOWN_ITERS=4000 CLIP_RANGE=31 COMPRESSOR=lzma \
+NUM_KV_HEADS=4 EVAL_STRIDE=64 \
+GPTQ_ENABLED=1 GPTQ_CALIB_BATCHES=64 GPTQ_CALIB_SOURCE=val \
+GPTQ_BLOCK_SIZE=128 SWA_ENABLED=1 LATE_QAT_THRESHOLD=0.15 \
+COMP_ENABLED=1 COMP_ALPHA=0.50 COMP_ORDER=5 COMP_WARMUP=200 COMP_MIN_COUNT=3 \
+NGRAM_CACHE=1 NGRAM_ORDER=15 NGRAM_MIN_ORDER=2 \
+NGRAM_BUCKETS=4194304 NGRAM_DIRICHLET=1 NGRAM_CONCENTRATION=5.0 \
+NGRAM_TEMPERATURE=1.0 \
+NGRAM_PER_ORDER_CONC="50.0,50.0,6.95,2.98,2.05,2.05,2.05,1.86,1.86,1.86,1.86,1.86,1.86,1.86" \
+PHRASE_CACHE=1 PHRASE_BUCKETS=1048576 PHRASE_PROBE_LENGTHS=20,16 \
+PHRASE_DIRICHLET=1 PHRASE_CONCENTRATION=1.0 PHRASE_MIN_COUNT=1 \
+NCCL_TIMEOUT=3600 SEED=1337 \
+torchrun --standalone --nproc_per_node=8 train_gpt.py
+```
diff --git a/records/track_10min_16mb/2026-03-27_Dirichlet_Ngram_Phrase_Cache/submission.json b/records/track_10min_16mb/2026-03-27_Dirichlet_Ngram_Phrase_Cache/submission.json
@@ -0,0 +1,9 @@
+{
+  "name": "Two-Level Dirichlet Posterior + Per-Order OBCL + Phrase Cache (3-seed)",
+  "val_bpb": 0.1156,
+  "bytes_total": 15077877,
+  "blurb": "Two-level Dirichlet-Multinomial posterior mixing with per-order OBCL concentrations (50.0 for bigrams to 1.86 for 14-grams) and phrase suffix matching (probes at 20,16 tokens). 15-gram backoff with neural base measure. Complementary training (alpha=0.50). EBLS architecture (3 shared x 3 loops + 2 unique = 11L). GPTQ int6 + LZMA. 3-seed mean: 0.11556 (std 0.0000057). Based on techniques from PRs #414, #900, #846.",
+  "author": "Nathan Maine",
+  "github_id": "NathanMaine",
+  "date": "2026-03-27"
+}
diff --git a/records/track_10min_16mb/2026-03-27_Dirichlet_Ngram_Phrase_Cache/train_gpt.py b/records/track_10min_16mb/2026-03-27_Dirichlet_Ngram_Phrase_Cache/train_gpt.py
diff --git a/records/track_10min_16mb/2026-03-27_Dirichlet_Ngram_Phrase_Cache/train_seed1337.log b/records/track_10min_16mb/2026-03-27_Dirichlet_Ngram_Phrase_Cache/train_seed1337.log
@@ -0,0 +1,94 @@
+=== Parameter Golf v2 — N-gram Cache Submission ===
+Seed: 1337
+N-gram: orders 2-15, Dirichlet=1
+Phrase: probes=20,16, Dirichlet=1
+Complementary: alpha=0.50, order=5
+
+W0327 04:23:51.748000 2469 torch/distributed/run.py:803] 
+W0327 04:23:51.748000 2469 torch/distributed/run.py:803] *****************************************
+W0327 04:23:51.748000 2469 torch/distributed/run.py:803] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
+W0327 04:23:51.748000 2469 torch/distributed/run.py:803] *****************************************
+logs/1375010a-c533-46df-9444-a96d2f458e76.txt
+val_bpb:enabled tokenizer_kind=sentencepiece tokenizer_path=/workspace/parameter-golf/data/tokenizers/fineweb_1024_bpe.model
+train_loader:dataset:fineweb10B_sp1024 train_shards:80
+val_loader:shards pattern=/workspace/parameter-golf/data/datasets/fineweb10B_sp1024/fineweb_val_*.bin tokens:62021632
+mixed_precision: clip_range=31 (int6) compressor=lzma
+model_params:27124848
+XSA:last_11 active_layers:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+VRL:True active_layers:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+world_size:8 grad_accum_steps:1 sdp:flash=True
+attention_mode:gqa num_heads:8 num_kv_heads:4
+tie_embeddings:True embed_lr:0.035 head_lr:0.0 matrix_lr:0.025 scalar_lr:0.025
+train_batch_tokens:786432 train_seq_len:2048 iterations:20000 warmup_steps:20 max_wallclock_seconds:560.000
+seed:1337
+warmup_step:1/20
+warmup_step:2/20
+warmup_step:3/20
+warmup_step:4/20
+warmup_step:5/20
+warmup_step:6/20
+warmup_step:7/20
+warmup_step:8/20
+warmup_step:9/20
+warmup_step:10/20
+warmup_step:11/20
+warmup_step:12/20
+warmup_step:13/20
+warmup_step:14/20
+warmup_step:15/20
+warmup_step:16/20
+warmup_step:17/20
+warmup_step:18/20
+warmup_step:19/20
+warmup_step:20/20
+comp_train:enabled orders=2-5 alpha=0.5 warmup=200
+step:0/20000 val_loss:6.9301 val_bpb:4.1044 train_time:0ms step_avg:0.01ms
+step:1/20000 train_loss:6.9313 train_time:172ms step_avg:172.18ms
+step:2/20000 train_loss:8.7049 train_time:289ms step_avg:144.47ms
+step:3/20000 train_loss:7.8658 train_time:402ms step_avg:134.10ms
+step:4/20000 train_loss:7.1343 train_time:515ms step_avg:128.67ms
+step:5/20000 train_loss:6.9742 train_time:628ms step_avg:125.64ms
+step:6/20000 train_loss:7.0034 train_time:751ms step_avg:125.09ms
+step:7/20000 train_loss:6.9378 train_time:862ms step_avg:123.15ms
+step:8/20000 train_loss:6.7761 train_time:974ms step_avg:121.71ms
+step:9/20000 train_loss:6.4129 train_time:1085ms step_avg:120.57ms
+step:10/20000 train_loss:6.0878 train_time:1196ms step_avg:119.56ms
+step:500/20000 train_loss:1.7359 train_time:67826ms step_avg:135.65ms
+step:1000/20000 train_loss:1.6029 train_time:141960ms step_avg:141.96ms
+step:1500/20000 train_loss:1.5659 train_time:215456ms step_avg:143.64ms
+step:2000/20000 train_loss:1.4602 train_time:289461ms step_avg:144.73ms
+step:2500/20000 train_loss:1.5130 train_time:363344ms step_avg:145.34ms
+step:3000/20000 train_loss:1.5011 train_time:436909ms step_avg:145.64ms
+swa:start step:3100
+late_qat:enabled step:3248 scale:0.1499
+step:3500/20000 train_loss:1.4806 train_time:510217ms step_avg:145.78ms
+step:3827/20000 val_loss:1.9690 val_bpb:1.1661 train_time:560026ms step_avg:146.34ms
+stopping_early: wallclock_cap train_time:560026ms step:3827/20000
+peak memory allocated: 22534 MiB reserved: 22604 MiB
+swa:applying 15 snapshots, blending with EMA (0.50/0.50)
+DIAGNOSTIC post_ema val_loss:1.9689 val_bpb:1.1661 eval_time:2097ms
+Serialized model: 106449565 bytes Code: 87629 bytes
+gptq:collecting hessians batches=64 source=val
+gptq:hessians collected layers=68 time=11.1s
+gptq:pre_prune artifact=14990248 target=15907371
+Saved quantized model to final_int6_model.pt
+Serialized model int63+lzma: 14990248 bytes
+Total submission size: 15077877 bytes
+final_int6_roundtrip val_loss:1.9773 val_bpb:1.1711 exact:1.17105855 eval_time:34030ms
+ngram_cache:enabled orders=2-15 dirichlet=True concentration=5.0 temperature=1.0 entropy=True min_count=2 buckets=4194304 order_mults=none alpha_max=0.95
+phrase_cache:enabled probes=[20, 16] dirichlet=True conc=1.0 alpha=0.9 buckets=1048576
+ngram_prefill:rank1 pre-filled 7754688 positions in 19.0s
+phrase_prefill:rank1 pre-filled 7754688 positions in 3.1s
+ngram_prefill:rank2 pre-filled 15507392 positions in 48.7s
+ngram_prefill:rank3 pre-filled 23260096 positions in 56.0s
+phrase_prefill:rank2 pre-filled 15507392 positions in 7.5s
+phrase_prefill:rank3 pre-filled 23260096 positions in 10.4s
+ngram_prefill:rank4 pre-filled 31012800 positions in 99.6s
+phrase_prefill:rank4 pre-filled 31012800 positions in 14.5s
+ngram_prefill:rank5 pre-filled 38765504 positions in 126.1s
+phrase_prefill:rank5 pre-filled 38765504 positions in 17.8s
+ngram_prefill:rank6 pre-filled 46518208 positions in 147.0s
+phrase_prefill:rank6 pre-filled 46518208 positions in 21.3s
+ngram_prefill:rank7 pre-filled 54270912 positions in 179.8s
+phrase_prefill:rank7 pre-filled 54270912 positions in 25.7s
+final_int6_sliding_window val_loss:0.1951 val_bpb:0.1156 exact:0.11555061 stride:64 eval_time:418945ms
diff --git a/records/track_10min_16mb/2026-03-27_Dirichlet_Ngram_Phrase_Cache/train_seed2025.log b/records/track_10min_16mb/2026-03-27_Dirichlet_Ngram_Phrase_Cache/train_seed2025.log
@@ -0,0 +1,94 @@
+=== Parameter Golf v2 — N-gram Cache Submission ===
+Seed: 2025
+N-gram: orders 2-15, Dirichlet=1
+Phrase: probes=20,16, Dirichlet=1
+Complementary: alpha=0.50, order=5
+
+W0327 05:08:36.831000 75454 torch/distributed/run.py:803] 
+W0327 05:08:36.831000 75454 torch/distributed/run.py:803] *****************************************
+W0327 05:08:36.831000 75454 torch/distributed/run.py:803] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
+W0327 05:08:36.831000 75454 torch/distributed/run.py:803] *****************************************
+logs/a5507699-4bee-4690-a084-705102c8d096.txt
+val_bpb:enabled tokenizer_kind=sentencepiece tokenizer_path=/workspace/parameter-golf/data/tokenizers/fineweb_1024_bpe.model
+train_loader:dataset:fineweb10B_sp1024 train_shards:80
+val_loader:shards pattern=/workspace/parameter-golf/data/datasets/fineweb10B_sp1024/fineweb_val_*.bin tokens:62021632
+mixed_precision: clip_range=31 (int6) compressor=lzma
+model_params:27124848
+XSA:last_11 active_layers:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+VRL:True active_layers:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+world_size:8 grad_accum_steps:1 sdp:flash=True
+attention_mode:gqa num_heads:8 num_kv_heads:4
+tie_embeddings:True embed_lr:0.035 head_lr:0.0 matrix_lr:0.025 scalar_lr:0.025
+train_batch_tokens:786432 train_seq_len:2048 iterations:20000 warmup_steps:20 max_wallclock_seconds:560.000
+seed:2025
+warmup_step:1/20
+warmup_step:2/20
+warmup_step:3/20
+warmup_step:4/20
+warmup_step:5/20
+warmup_step:6/20
+warmup_step:7/20
+warmup_step:8/20
+warmup_step:9/20
+warmup_step:10/20
+warmup_step:11/20
+warmup_step:12/20
+warmup_step:13/20
+warmup_step:14/20
+warmup_step:15/20
+warmup_step:16/20
+warmup_step:17/20
+warmup_step:18/20
+warmup_step:19/20
+warmup_step:20/20
+comp_train:enabled orders=2-5 alpha=0.5 warmup=200
+step:0/20000 val_loss:6.9306 val_bpb:4.1047 train_time:0ms step_avg:0.01ms
+step:1/20000 train_loss:6.9326 train_time:171ms step_avg:170.93ms
+step:2/20000 train_loss:8.8346 train_time:287ms step_avg:143.36ms
+step:3/20000 train_loss:7.9183 train_time:406ms step_avg:135.18ms
+step:4/20000 train_loss:7.0638 train_time:522ms step_avg:130.55ms
+step:5/20000 train_loss:7.0104 train_time:635ms step_avg:126.93ms
+step:6/20000 train_loss:6.9854 train_time:748ms step_avg:124.73ms
+step:7/20000 train_loss:6.7751 train_time:861ms step_avg:123.02ms
+step:8/20000 train_loss:6.6528 train_time:975ms step_avg:121.82ms
+step:9/20000 train_loss:6.3819 train_time:1092ms step_avg:121.29ms
+step:10/20000 train_loss:6.0564 train_time:1206ms step_avg:120.61ms
+step:500/20000 train_loss:1.7383 train_time:68843ms step_avg:137.69ms
+step:1000/20000 train_loss:1.6012 train_time:143398ms step_avg:143.40ms
+step:1500/20000 train_loss:1.5667 train_time:217428ms step_avg:144.95ms
+step:2000/20000 train_loss:1.4609 train_time:292012ms step_avg:146.01ms
+step:2500/20000 train_loss:1.5132 train_time:365560ms step_avg:146.22ms
+step:3000/20000 train_loss:1.5019 train_time:439885ms step_avg:146.63ms
+swa:start step:3050
+late_qat:enabled step:3204 scale:0.1499
+step:3500/20000 train_loss:1.4818 train_time:515064ms step_avg:147.16ms
+step:3809/20000 val_loss:1.9709 val_bpb:1.1673 train_time:560029ms step_avg:147.03ms
+stopping_early: wallclock_cap train_time:560029ms step:3809/20000
+peak memory allocated: 22528 MiB reserved: 22568 MiB
+swa:applying 16 snapshots, blending with EMA (0.50/0.50)
+DIAGNOSTIC post_ema val_loss:1.9709 val_bpb:1.1673 eval_time:2094ms
+Serialized model: 106449565 bytes Code: 87629 bytes
+gptq:collecting hessians batches=64 source=val
+gptq:hessians collected layers=68 time=10.8s
+gptq:pre_prune artifact=15245452 target=15907371
+Saved quantized model to final_int6_model.pt
+Serialized model int63+lzma: 15245452 bytes
+Total submission size: 15333081 bytes
+final_int6_roundtrip val_loss:1.9795 val_bpb:1.1723 exact:1.17234936 eval_time:6449ms
+ngram_cache:enabled orders=2-15 dirichlet=True concentration=5.0 temperature=1.0 entropy=True min_count=2 buckets=4194304 order_mults=none alpha_max=0.95
+phrase_cache:enabled probes=[20, 16] dirichlet=True conc=1.0 alpha=0.9 buckets=1048576
+ngram_prefill:rank1 pre-filled 7754688 positions in 20.3s
+phrase_prefill:rank1 pre-filled 7754688 positions in 3.2s
+ngram_prefill:rank2 pre-filled 15507392 positions in 46.0s
+phrase_prefill:rank2 pre-filled 15507392 positions in 6.9s
+ngram_prefill:rank3 pre-filled 23260096 positions in 73.0s
+phrase_prefill:rank3 pre-filled 23260096 positions in 10.8s
+ngram_prefill:rank4 pre-filled 31012800 positions in 94.8s
+ngram_prefill:rank5 pre-filled 38765504 positions in 99.7s
+phrase_prefill:rank4 pre-filled 31012800 positions in 14.4s
+phrase_prefill:rank5 pre-filled 38765504 positions in 17.8s
+ngram_prefill:rank6 pre-filled 46518208 positions in 132.6s
+ngram_prefill:rank7 pre-filled 54270912 positions in 139.0s
+phrase_prefill:rank6 pre-filled 46518208 positions in 21.3s
+phrase_prefill:rank7 pre-filled 54270912 positions in 24.6s
+final_int6_sliding_window val_loss:0.1951 val_bpb:0.1156 exact:0.11555875 stride:64 eval_time:359137ms
diff --git a/records/track_10min_16mb/2026-03-27_Dirichlet_Ngram_Phrase_Cache/train_seed42.log b/records/track_10min_16mb/2026-03-27_Dirichlet_Ngram_Phrase_Cache/train_seed42.log
@@ -0,0 +1,94 @@
+=== Parameter Golf v2 — N-gram Cache Submission ===
+Seed: 42
+N-gram: orders 2-15, Dirichlet=1
+Phrase: probes=20,16, Dirichlet=1
+Complementary: alpha=0.50, order=5
+
+W0327 04:47:15.636000 74537 torch/distributed/run.py:803] 
+W0327 04:47:15.636000 74537 torch/distributed/run.py:803] *****************************************
+W0327 04:47:15.636000 74537 torch/distributed/run.py:803] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
+W0327 04:47:15.636000 74537 torch/distributed/run.py:803] *****************************************
+logs/2546d581-ae68-43b6-8dc3-3ae0e4280c14.txt
+val_bpb:enabled tokenizer_kind=sentencepiece tokenizer_path=/workspace/parameter-golf/data/tokenizers/fineweb_1024_bpe.model
+train_loader:dataset:fineweb10B_sp1024 train_shards:80
+val_loader:shards pattern=/workspace/parameter-golf/data/datasets/fineweb10B_sp1024/fineweb_val_*.bin tokens:62021632
+mixed_precision: clip_range=31 (int6) compressor=lzma
+model_params:27124848
+XSA:last_11 active_layers:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+VRL:True active_layers:[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
+world_size:8 grad_accum_steps:1 sdp:flash=True
+attention_mode:gqa num_heads:8 num_kv_heads:4
+tie_embeddings:True embed_lr:0.035 head_lr:0.0 matrix_lr:0.025 scalar_lr:0.025
+train_batch_tokens:786432 train_seq_len:2048 iterations:20000 warmup_steps:20 max_wallclock_seconds:560.000
+seed:42
+warmup_step:1/20
+warmup_step:2/20
+warmup_step:3/20
+warmup_step:4/20
+warmup_step:5/20
+warmup_step:6/20
+warmup_step:7/20
+warmup_step:8/20
+warmup_step:9/20
+warmup_step:10/20
+warmup_step:11/20
+warmup_step:12/20
+warmup_step:13/20
+warmup_step:14/20
+warmup_step:15/20
+warmup_step:16/20
+warmup_step:17/20
+warmup_step:18/20
+warmup_step:19/20
+warmup_step:20/20
+comp_train:enabled orders=2-5 alpha=0.5 warmup=200
+step:0/20000 val_loss:6.9301 val_bpb:4.1044 train_time:0ms step_avg:0.01ms
+step:1/20000 train_loss:6.9323 train_time:167ms step_avg:166.95ms
+step:2/20000 train_loss:8.6971 train_time:282ms step_avg:141.07ms
+step:3/20000 train_loss:7.8409 train_time:398ms step_avg:132.79ms
+step:4/20000 train_loss:7.0864 train_time:511ms step_avg:127.84ms
+step:5/20000 train_loss:6.9765 train_time:625ms step_avg:124.92ms
+step:6/20000 train_loss:6.9884 train_time:738ms step_avg:122.94ms
+step:7/20000 train_loss:6.9120 train_time:854ms step_avg:121.97ms
+step:8/20000 train_loss:6.7496 train_time:966ms step_avg:120.73ms
+step:9/20000 train_loss:6.4399 train_time:1079ms step_avg:119.86ms
+step:10/20000 train_loss:6.1220 train_time:1190ms step_avg:118.95ms
+step:500/20000 train_loss:1.7425 train_time:68801ms step_avg:137.60ms
+step:1000/20000 train_loss:1.6038 train_time:143170ms step_avg:143.17ms
+step:1500/20000 train_loss:1.5671 train_time:217569ms step_avg:145.05ms
+step:2000/20000 train_loss:1.4598 train_time:292781ms step_avg:146.39ms
+step:2500/20000 train_loss:1.5127 train_time:367300ms step_avg:146.92ms
+swa:start step:3000
+step:3000/20000 train_loss:1.5014 train_time:442248ms step_avg:147.42ms
+late_qat:enabled step:3179 scale:0.1496
+step:3500/20000 train_loss:1.4821 train_time:518015ms step_avg:148.00ms
+step:3784/20000 val_loss:1.9722 val_bpb:1.1681 train_time:560072ms step_avg:148.01ms
+stopping_early: wallclock_cap train_time:560072ms step:3784/20000
+peak memory allocated: 22528 MiB reserved: 22568 MiB
+swa:applying 16 snapshots, blending with EMA (0.50/0.50)
+DIAGNOSTIC post_ema val_loss:1.9723 val_bpb:1.1681 eval_time:2096ms
+Serialized model: 106449565 bytes Code: 87629 bytes
+gptq:collecting hessians batches=64 source=val
+gptq:hessians collected layers=68 time=11.0s
+gptq:pre_prune artifact=15262616 target=15907371
+Saved quantized model to final_int6_model.pt
+Serialized model int63+lzma: 15262616 bytes
+Total submission size: 15350245 bytes
+final_int6_roundtrip val_loss:1.9809 val_bpb:1.1732 exact:1.17317850 eval_time:6489ms
+ngram_cache:enabled orders=2-15 dirichlet=True concentration=5.0 temperature=1.0 entropy=True min_count=2 buckets=4194304 order_mults=none alpha_max=0.95
+phrase_cache:enabled probes=[20, 16] dirichlet=True conc=1.0 alpha=0.9 buckets=1048576
+ngram_prefill:rank1 pre-filled 7754688 positions in 22.2s
+phrase_prefill:rank1 pre-filled 7754688 positions in 3.2s
+ngram_prefill:rank2 pre-filled 15507392 positions in 42.1s
+phrase_prefill:rank2 pre-filled 15507392 positions in 6.7s
+ngram_prefill:rank3 pre-filled 23260096 positions in 64.3s
+phrase_prefill:rank3 pre-filled 23260096 positions in 10.7s
+ngram_prefill:rank4 pre-filled 31012800 positions in 90.3s
+phrase_prefill:rank4 pre-filled 31012800 positions in 14.8s
+ngram_prefill:rank5 pre-filled 38765504 positions in 112.6s
+ngram_prefill:rank6 pre-filled 46518208 positions in 122.0s
+phrase_prefill:rank5 pre-filled 38765504 positions in 18.1s
+phrase_prefill:rank6 pre-filled 46518208 positions in 21.7s
+ngram_prefill:rank7 pre-filled 54270912 positions in 152.4s
+phrase_prefill:rank7 pre-filled 54270912 positions in 24.7s
+final_int6_sliding_window val_loss:0.1951 val_bpb:0.1156 exact:0.11556435 stride:64 eval_time:370090ms
diff --git a/records/track_10min_16mb/2026-03-27_Order20_Dirichlet_Ngram_Phrase/README.md b/records/track_10min_16mb/2026-03-27_Order20_Dirichlet_Ngram_Phrase/README.md
@@ -0,0 +1,45 @@
+# Order-20 Dirichlet Posterior + Phrase Cache
+
+**val_bpb: 0.11545** (3-seed mean, std 0.0000010) | **~15.1 MB** | 8xH100 SXM
+
+Extends n-gram backoff from order 15 to order 20, improving over our PR #948 (0.11556 BPB).
+
+## Results (8xH100 80GB SXM, Montréal CA, 747 TFLOPS)
+
+| Seed | Val BPB | Eval Time |
+|------|---------|-----------|
+| 1337 | 0.11544435 | 459s |
+| 42 | 0.11546433 | 435s |
+| 2025 | 0.11544736 | 438s |
+| **Mean** | **0.11545 (std 0.0000010)** | |
+
+## What changed from PR #948
+
+- `NGRAM_ORDER=20` (was 15)
+- Added 5 more per-order concentrations: all 1.86 (matching the pattern for high-order matches)
+- Everything else identical
+
+## Ablation (1xH100, 200 steps, Kansas City MO)
+
+| Config | BPB | Delta |
+|--------|-----|-------|
+| Order 15 (baseline) | 0.11906 | — |
+| **Order 20** | **0.11873** | **-0.00033** |
+| Two-pass rescore | 0.11906 | 0 |
+| Int5 quantization | 0.11906 | 0 |
+| Comp alpha=0.30 | 0.11906 | 0 |
+
+Order 20 was the only ablation that showed improvement.
+
+## Credits
+
+Same as PR #948. Built on @Robby955 (PR #900), @signalrush (PR #414), @himanshudongre (PR #846), @deanbrr (PR #659), @newjordan (PR #674), @pentxayc (PR #803).
+
+## Run Command
+
+```bash
+NGRAM_ORDER=20 \
+NGRAM_PER_ORDER_CONC="50.0,50.0,6.95,2.98,2.05,2.05,2.05,1.86,1.86,1.86,1.86,1.86,1.86,1.86,1.86,1.86,1.86,1.86,1.86" \
+# ... all other params same as PR #948
+torchrun --standalone --nproc_per_node=8 train_gpt.py
+```