10L + Multi-Order N-gram Backoff (0.9123 BPB) by Bortlesboat · Pull Request #802 · openai/parameter-golf

Bortlesboat · 2026-03-26T03:25:40Z

Record submission

val_bpb: 0.9123 (mean of 3 seeds, post int5/int6+zstd quantization roundtrip)

Seed	val_bpb	artifact_bytes
42	0.9128	15,320,000
1337	0.9121	15,630,000
2024	0.9121	15,330,000

Architecture

10 layers, d=512, GQA 8H/4KV, LeakyReLU(0.5)^2
Partial RoPE (16/64), LN Scale, XSA last 4, Value Residual
BigramHash(4096, dim=128), SmearGate, U-Net skips
Mixed int5 MLP / int6 attention + zstd-22
EMA(0.997), Muon WD=0.04, warmdown=3500

Eval: Multi-Order N-gram Backoff + Entropy-Adaptive Alpha

Hashed n-gram cache, orders 2 through 7 with backoff
Highest matching order wins (7-gram preferred, falls back to lower)
Entropy-adaptive alpha: alpha = 0.05 + 0.55 * sigmoid(2 * (H - 4.0))
Score-first: cache updated only AFTER scoring each segment
4M hash buckets per order, min_count=2

Timing (8xH100 SXM)

Training: 600s (~6020 steps at 99ms/step)
Eval: ~163s (sliding window stride=64, batch_seqs=64)

Based on

thwu1's 10L Int5-MLP base architecture
PR Record: First Legal Sub-1.0 BPB — Multi-order N-gram Backoff + Entropy-Adaptive Alpha (val_bpb=0.9674, 3-seed) #727 (multi-order n-gram backoff concept)
PR Record: LeakyReLU² + Legal Score-First TTT + Parallel Muon — val_bpb 1.1194 (3-seed mean) #549 (LeakyReLU^2)
PR Record: 11L XSA + EMA + Int6 MLP3x + WD=0.04 (val_bpb: 1.1271) #287 (XSA, Partial RoPE, LN Scale)

Explores stacking eval-time techniques (neural cache, LoRA TTT) and quantization-aware training on top of the openai#1 recipe. QAT has an export mismatch bug resulting in high quantization penalty — submitting as non-record to document the approach for iteration.

Non-record submission. 10 layers, d=512, GQA 8H/4KV, mixed int5/int6 quantization + zstd-22. BigramHash(4096, dim=128), SmearGate, SWA(0.4). Mean of 3 seeds: 1.1507 +/- 0.0006 BPB. All artifacts under 16MB.

10L d=512, GQA 8H/4KV, LeakyReLU(0.5)^2, Partial RoPE, LN Scale, XSA last 4, Value Residual, EMA(0.997). Mixed int5/int6 + zstd-22. Eval: multi-order hashed n-gram backoff (orders 2-7) with entropy- adaptive alpha. Mean of 3 seeds: 0.9123 +/- 0.0003 BPB.

Renamed to reflect actual technique (n-gram backoff + entropy alpha). Removed old 1.1507 BPB seed logs. Added explicit compliance/legality section per competition conventions.

Single change from PR openai#802: MATRIX_LR=0.03 (was 0.02). Discovered through systematic screening (74 experiments, steps 10-12). - 10L, 512d, GQA 8/4, LeakyReLU(0.5)², BigramHash 4096 - Multi-order n-gram backoff eval cache (orders 2-7) - Entropy-adaptive alpha mixing (score-first, legal) - 8xH100 SXM, 600s training, 138s eval - Artifact: 15.32 MB Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Bortlesboat added 3 commits March 20, 2026 23:10

records: 10L Int5-MLP + BigramHash(4096) + SWA (1.1507 BPB)

345f145

Non-record submission. 10 layers, d=512, GQA 8H/4KV, mixed int5/int6 quantization + zstd-22. BigramHash(4096, dim=128), SmearGate, SWA(0.4). Mean of 3 seeds: 1.1507 +/- 0.0006 BPB. All artifacts under 16MB.

Bortlesboat mentioned this pull request Mar 26, 2026

10L Int5-MLP + BigramHash(4096) + SWA (1.1507 BPB) #694

Closed

cleanup: rename folder, remove stale logs, add compliance section

e5a2377

Renamed to reflect actual technique (n-gram backoff + entropy alpha). Removed old 1.1507 BPB seed logs. Added explicit compliance/legality section per competition conventions.

bigbag mentioned this pull request Mar 26, 2026

Record: 0.9076 BPB — 10L + N-gram Backoff + Matrix LR 0.03 #828

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

10L + Multi-Order N-gram Backoff (0.9123 BPB)#802

10L + Multi-Order N-gram Backoff (0.9123 BPB)#802
Bortlesboat wants to merge 4 commits intoopenai:mainfrom
Bortlesboat:submission/v6-ngram-backoff

Bortlesboat commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Bortlesboat commented Mar 26, 2026

Record submission

Architecture

Eval: Multi-Order N-gram Backoff + Entropy-Adaptive Alpha

Timing (8xH100 SXM)

Based on

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant