Skip to content

10L + Multi-Order N-gram Backoff (0.9123 BPB)#802

Open
Bortlesboat wants to merge 4 commits intoopenai:mainfrom
Bortlesboat:submission/v6-ngram-backoff
Open

10L + Multi-Order N-gram Backoff (0.9123 BPB)#802
Bortlesboat wants to merge 4 commits intoopenai:mainfrom
Bortlesboat:submission/v6-ngram-backoff

Conversation

@Bortlesboat
Copy link

Record submission

val_bpb: 0.9123 (mean of 3 seeds, post int5/int6+zstd quantization roundtrip)

Seed val_bpb artifact_bytes
42 0.9128 15,320,000
1337 0.9121 15,630,000
2024 0.9121 15,330,000

Architecture

  • 10 layers, d=512, GQA 8H/4KV, LeakyReLU(0.5)^2
  • Partial RoPE (16/64), LN Scale, XSA last 4, Value Residual
  • BigramHash(4096, dim=128), SmearGate, U-Net skips
  • Mixed int5 MLP / int6 attention + zstd-22
  • EMA(0.997), Muon WD=0.04, warmdown=3500

Eval: Multi-Order N-gram Backoff + Entropy-Adaptive Alpha

  • Hashed n-gram cache, orders 2 through 7 with backoff
  • Highest matching order wins (7-gram preferred, falls back to lower)
  • Entropy-adaptive alpha: alpha = 0.05 + 0.55 * sigmoid(2 * (H - 4.0))
  • Score-first: cache updated only AFTER scoring each segment
  • 4M hash buckets per order, min_count=2

Timing (8xH100 SXM)

  • Training: 600s (~6020 steps at 99ms/step)
  • Eval: ~163s (sliding window stride=64, batch_seqs=64)

Based on

Explores stacking eval-time techniques (neural cache, LoRA TTT) and
quantization-aware training on top of the openai#1 recipe. QAT has an export
mismatch bug resulting in high quantization penalty — submitting as
non-record to document the approach for iteration.
Non-record submission. 10 layers, d=512, GQA 8H/4KV, mixed int5/int6
quantization + zstd-22. BigramHash(4096, dim=128), SmearGate, SWA(0.4).
Mean of 3 seeds: 1.1507 +/- 0.0006 BPB. All artifacts under 16MB.
10L d=512, GQA 8H/4KV, LeakyReLU(0.5)^2, Partial RoPE, LN Scale,
XSA last 4, Value Residual, EMA(0.997). Mixed int5/int6 + zstd-22.
Eval: multi-order hashed n-gram backoff (orders 2-7) with entropy-
adaptive alpha. Mean of 3 seeds: 0.9123 +/- 0.0003 BPB.
Renamed to reflect actual technique (n-gram backoff + entropy alpha).
Removed old 1.1507 BPB seed logs. Added explicit compliance/legality
section per competition conventions.
bigbag pushed a commit to bigbag/parameter-golf that referenced this pull request Mar 26, 2026
Single change from PR openai#802: MATRIX_LR=0.03 (was 0.02).
Discovered through systematic screening (74 experiments, steps 10-12).

- 10L, 512d, GQA 8/4, LeakyReLU(0.5)², BigramHash 4096
- Multi-order n-gram backoff eval cache (orders 2-7)
- Entropy-adaptive alpha mixing (score-first, legal)
- 8xH100 SXM, 600s training, 138s eval
- Artifact: 15.32 MB

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant