Record: 0.2873 BPB — Fine-Grained N-gram Cache (65K chunks) by quietsmile · Pull Request #840 · openai/parameter-golf

quietsmile · 2026-03-26T11:03:07Z

Summary

val_bpb: 0.2873 (3-seed mean, std 0.0001) | ~13.4 MB | 8xH100 SXM | 600s train + ~405s eval

Key Innovation: Fine-Grained N-gram Chunk Updates

The single most impactful change: reducing NGRAM_EVAL_CHUNK_TOKENS from 1,000,000 to 65,536.

The N-gram backoff cache only updates after each chunk is fully scored. With 1M-token chunks, the first million validation tokens see an empty cache — losing enormous predictive power. With 65K-token chunks, the cache refreshes 15x more frequently, giving each subsequent chunk a much richer set of n-gram statistics to draw from.

Chunk Size	BPB	Delta
1,000,000	0.4572	baseline
65,536	0.2872	-0.170

This is purely an eval-time optimization — no training changes, no TTT, no additional compute.

3-Seed Results

Seed	BPB	Artifact bytes
1337	0.28725	~13.4MB
42	0.28720	~13.4MB
2024	0.28744	~13.4MB
Mean	0.2873 (std 0.0001)

Architecture

11L 512d GQA 8/4, MLP 3.0x, XSA-4, LeakyReLU(0.9)², BigramHash(4096), GPTQ int5 + LZMA.
EMA(0.997) + SWA. Parallel Muon optimizer. Perplexity-sorted shard ordering.

N-gram Cache Details

Order 2-9 backoff with 4M hash buckets
Entropy-adaptive alpha: α varies by model confidence and n-gram order
Per-order multipliers: low orders (2-3) suppressed at 0.3x, high orders (5-9) boosted at 2.0x
Score-first: cache updated ONLY after scoring each 65K-token chunk
All GPU ranks share identical cache state

Compliance

Training: 600s on 8xH100 SXM (within 600s)
Eval: ~405s on 8xH100 SXM (within 600s)
Artifacts under 16,000,000 bytes
No TTT — purely N-gram cache at eval time
Cache strictly backward-looking — updated only after scoring
No oracle, no training data at eval time

Credits

This builds on community work:

@deanbrr (PR Record: 5-gram Eval Cache + LeakyReLU² + Parallel Muon val_bpb: 1.0920 (3-seed mean, std 0.0007) | ~15.9 MB | 8×H100 SXM #659) — original n-gram cache concept
@newjordan (PR Podracing: 1.0461 BPB (3-seed mean) #674) — first legal implementation
@lukacf (PR Record: 1.0240 BPB — Multi-Order N-gram Backoff + Entropy-Adaptive Alpha (100% autonomous research via goldfish) #702) — multi-order backoff + entropy-adaptive sigmoid
@Asukabot0 (PR Record: First Legal Sub-1.0 BPB — Multi-order N-gram Backoff + Entropy-Adaptive Alpha (val_bpb=0.9674, 3-seed) #727) — 7-gram, first sub-1.0 BPB
@raahilshah (PR Record: 11L XSA-all + Full GPTQ (Budget-Legal) + Parallel Muon + Selective Pruning (val_bpb: 1.1178, 3-seed mean) #634) — XSA-all
@parinzee (PR Record: 11L EMA + Int6 + XSA + LeakyReLU² + Partial RoPE (val_bpb: 1.1309) #493) — LeakyReLU(0.5)²
@signalrush (PR Record: 11L EMA + GPTQ-lite + warmdown3500 + QAT@0.15 (val_bpb=1.1233) #414) — base GPTQ + EMA + warmdown stack
@travispchen (PR Record: Order-Adaptive Entropy Gating + BackoffNgramMixer (val_bpb=0.5466) #798) — per-order entropy thresholds

Our novel contribution: Fine-grained chunk updates for N-gram cache (65K vs 1M), demonstrating that cache update frequency is the dominant factor in N-gram BPB.

🤖 Generated with Claude Code

Key innovation: reduce NGRAM_EVAL_CHUNK_TOKENS from 1M to 65K. The N-gram cache updates after each chunk, so smaller chunks mean more frequent cache refreshes and richer n-gram statistics. Results (3-seed mean): 0.2873 BPB (std 0.0001) Fully legal: no pre-eval TTT, score-first N-gram only. 11L 512d GQA 8/4, MLP 3.0x, XSA-4, LeakyReLU(0.9)², BigramHash(4096), GPTQ int5, LZMA. 600s train + 405s eval. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

notapplica mentioned this pull request Mar 26, 2026

⛳ Parameter Golf Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes #140

Open

callithyia mentioned this pull request Mar 26, 2026

Record: 0.3212 BPB — Complementary N-gram 65K + Int5 GPTQ + LoRA TTT #850

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: 0.2873 BPB — Fine-Grained N-gram Cache (65K chunks)#840

Record: 0.2873 BPB — Fine-Grained N-gram Cache (65K chunks)#840
quietsmile wants to merge 1 commit intoopenai:mainfrom
quietsmile:submission/ngram-chunk65k-0.287

quietsmile commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

quietsmile commented Mar 26, 2026

Summary

Key Innovation: Fine-Grained N-gram Chunk Updates

3-Seed Results

Architecture

N-gram Cache Details

Compliance

Credits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant