Skip to content

10L + Two-Pass Order-11 N-gram Backoff (0.5863 BPB)#876

Open
Bortlesboat wants to merge 5 commits intoopenai:mainfrom
Bortlesboat:submission/v7-twopass-order11
Open

10L + Two-Pass Order-11 N-gram Backoff (0.5863 BPB)#876
Bortlesboat wants to merge 5 commits intoopenai:mainfrom
Bortlesboat:submission/v7-twopass-order11

Conversation

@Bortlesboat
Copy link

Record submission

val_bpb: 0.5863 (mean of 3 seeds, std 0.0002)

Seed val_bpb artifact_bytes
42 0.5864 15,420,000
1337 0.5864 15,570,000
2024 0.5860 15,370,000

Method

10L d=512 GQA transformer with two-pass eval:

Pass 1 (189s): score-first sliding window with orders 2-11 hashed n-gram cache. Order-adaptive entropy gating — higher-order matches trust n-gram at lower model uncertainty. Cache updated only AFTER scoring.

Pass 2 (140s): rescore early cold-cache windows using the now-complete cache (frozen, no updates). All rescored tokens were already evaluated in pass 1. Total eval: 331s.

Architecture

  • 10L, d=512, GQA 8H/4KV, LeakyReLU(0.5)^2, Partial RoPE, LN Scale, XSA last 4, Value Residual
  • Mixed int5 MLP / int6 attention + zstd-22, EMA(0.997), matrix_lr=0.03

Compliance

  • Score-first: BPB finalized before cache update
  • Backward-looking: only previously scored tokens in cache
  • No target-aware gating: alpha from model entropy + matched order only
  • Pass 2: rescores already-evaluated tokens with frozen cache

Timing (8xH100 SXM)

  • Training: 600s (~6004 steps)
  • Eval: 331s (pass 1: 189s + pass 2: 142s)
  • Artifact: 15.4-15.6 MB

Based on

Explores stacking eval-time techniques (neural cache, LoRA TTT) and
quantization-aware training on top of the openai#1 recipe. QAT has an export
mismatch bug resulting in high quantization penalty — submitting as
non-record to document the approach for iteration.
Non-record submission. 10 layers, d=512, GQA 8H/4KV, mixed int5/int6
quantization + zstd-22. BigramHash(4096, dim=128), SmearGate, SWA(0.4).
Mean of 3 seeds: 1.1507 +/- 0.0006 BPB. All artifacts under 16MB.
10L d=512, GQA 8H/4KV, LeakyReLU(0.5)^2, Partial RoPE, LN Scale,
XSA last 4, Value Residual, EMA(0.997). Mixed int5/int6 + zstd-22.
Eval: multi-order hashed n-gram backoff (orders 2-7) with entropy-
adaptive alpha. Mean of 3 seeds: 0.9123 +/- 0.0003 BPB.
Renamed to reflect actual technique (n-gram backoff + entropy alpha).
Removed old 1.1507 BPB seed logs. Added explicit compliance/legality
section per competition conventions.
Two-pass eval: pass 1 builds order 2-11 n-gram cache with order-adaptive
entropy gating, pass 2 rescores cold-cache early windows with full cache.
Mean of 3 seeds: 0.5863 +/- 0.0002 BPB. All artifacts under 16MB.
Total eval: 331s on 8xH100.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant