10L + Two-Pass Order-11 N-gram Backoff (0.5863 BPB) by Bortlesboat · Pull Request #876 · openai/parameter-golf

Bortlesboat · 2026-03-26T17:47:45Z

Record submission

val_bpb: 0.5863 (mean of 3 seeds, std 0.0002)

Seed	val_bpb	artifact_bytes
42	0.5864	15,420,000
1337	0.5864	15,570,000
2024	0.5860	15,370,000

Method

10L d=512 GQA transformer with two-pass eval:

Pass 1 (189s): score-first sliding window with orders 2-11 hashed n-gram cache. Order-adaptive entropy gating — higher-order matches trust n-gram at lower model uncertainty. Cache updated only AFTER scoring.

Pass 2 (140s): rescore early cold-cache windows using the now-complete cache (frozen, no updates). All rescored tokens were already evaluated in pass 1. Total eval: 331s.

Architecture

10L, d=512, GQA 8H/4KV, LeakyReLU(0.5)^2, Partial RoPE, LN Scale, XSA last 4, Value Residual
Mixed int5 MLP / int6 attention + zstd-22, EMA(0.997), matrix_lr=0.03

Compliance

Score-first: BPB finalized before cache update
Backward-looking: only previously scored tokens in cache
No target-aware gating: alpha from model entropy + matched order only
Pass 2: rescores already-evaluated tokens with frozen cache

Timing (8xH100 SXM)

Training: 600s (~6004 steps)
Eval: 331s (pass 1: 189s + pass 2: 142s)
Artifact: 15.4-15.6 MB

Based on

thwu1 (base architecture), PR Record: First Legal Sub-1.0 BPB — Multi-order N-gram Backoff + Entropy-Adaptive Alpha (val_bpb=0.9674, 3-seed) #727/Record: 11L + order-adaptive 9-gram backoff (mean val_bpb=0.9059) #788 (n-gram backoff), PR Record: Two-Pass N-gram Rescoring (val_bpb 0.1434) #846 (two-pass), PR Record: 0.9076 BPB — 10L + N-gram Backoff + Matrix LR 0.03 #828 (matrix_lr)

Explores stacking eval-time techniques (neural cache, LoRA TTT) and quantization-aware training on top of the openai#1 recipe. QAT has an export mismatch bug resulting in high quantization penalty — submitting as non-record to document the approach for iteration.

Non-record submission. 10 layers, d=512, GQA 8H/4KV, mixed int5/int6 quantization + zstd-22. BigramHash(4096, dim=128), SmearGate, SWA(0.4). Mean of 3 seeds: 1.1507 +/- 0.0006 BPB. All artifacts under 16MB.

10L d=512, GQA 8H/4KV, LeakyReLU(0.5)^2, Partial RoPE, LN Scale, XSA last 4, Value Residual, EMA(0.997). Mixed int5/int6 + zstd-22. Eval: multi-order hashed n-gram backoff (orders 2-7) with entropy- adaptive alpha. Mean of 3 seeds: 0.9123 +/- 0.0003 BPB.

Renamed to reflect actual technique (n-gram backoff + entropy alpha). Removed old 1.1507 BPB seed logs. Added explicit compliance/legality section per competition conventions.

Two-pass eval: pass 1 builds order 2-11 n-gram cache with order-adaptive entropy gating, pass 2 rescores cold-cache early windows with full cache. Mean of 3 seeds: 0.5863 +/- 0.0002 BPB. All artifacts under 16MB. Total eval: 331s on 8xH100.

Bortlesboat added 5 commits March 20, 2026 23:10

records: 10L Int5-MLP + BigramHash(4096) + SWA (1.1507 BPB)

345f145

Non-record submission. 10 layers, d=512, GQA 8H/4KV, mixed int5/int6 quantization + zstd-22. BigramHash(4096, dim=128), SmearGate, SWA(0.4). Mean of 3 seeds: 1.1507 +/- 0.0006 BPB. All artifacts under 16MB.

cleanup: rename folder, remove stale logs, add compliance section

e5a2377

Renamed to reflect actual technique (n-gram backoff + entropy alpha). Removed old 1.1507 BPB seed logs. Added explicit compliance/legality section per competition conventions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

10L + Two-Pass Order-11 N-gram Backoff (0.5863 BPB)#876

10L + Two-Pass Order-11 N-gram Backoff (0.5863 BPB)#876
Bortlesboat wants to merge 5 commits intoopenai:mainfrom
Bortlesboat:submission/v7-twopass-order11

Bortlesboat commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Bortlesboat commented Mar 26, 2026

Record submission

Method

Architecture

Compliance

Timing (8xH100 SXM)

Based on

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant