Record: X-WING 3D Cubric + Complementary Training (val_bpb=0.4820) by newjordan · Pull Request #814 · openai/parameter-golf

newjordan · 2026-03-26T05:24:29Z

Summary

val_bpb = 0.4820 (3-seed mean, std 0.0002)
Seeds: 1337 (0.4818), 300 (0.4821), 58 (0.4821)
11L transformer (26.9M params) with LeakyReLU(0.5)², XSA-4, SWA, EMA
Artifact: 15,581,439 bytes (under 16MB)
Training: ~6820 steps in 600s on 8xH100 SXM
Eval: ~203s / 600s budget

Key Innovation: 3D Cubric Pattern Recognizer + Complementary Training

Two novel techniques stacked on chunk-based shared n-gram tables:

1. 3D Cubric (original)

54 adaptive multipliers across (order × entropy_bin × count_bin). Each cell independently tracks n-gram beat rates and adjusts its alpha multiplier. Captures patterns invisible to 1D scaling — e.g. "order 7 at mid-entropy with high count → trust fully (2.0x)" vs "order 3 at any entropy → suppress (0.30x)".

Warm-start: multipliers initialize at proven converged values instead of 1.0. Full cubric power from chunk 1 instead of wasting ~30 of 60 chunks converging.

2. Complementary Training (adapted from PR #803)

During training, tokens predictable by bigram statistics receive lower loss weight (COMPLEMENT_ALPHA=0.5). The model specializes on tokens n-grams can't predict. This enables higher eval-time alpha (20-75% vs 5-70%) because the model is deliberately weak where n-grams are strong.

3. Shared N-gram Tables

All 8 GPU ranks update tables with the same chunk tokens → every rank sees the full 62M-token picture (vs 1/8 with rank-local). Insight from @deanbrr (PR #779).

Ablation

Variant	BPB	Delta	Key change
Podracer III (#782)	0.9362	—	rank-local tables
X-WING v1 (#800)	0.5644	-0.372	shared tables + 1D cubric
+ 3D cubric + complementary	0.4896	-0.075	54 multipliers + CT
+ warm-start (this)	0.4820	-0.008	converged init values

Legality

Score-first: entire chunk scored BEFORE its tokens update tables
Complementary training: uses only training-data bigram statistics — no validation data during training
Alpha formula: (1-α)·P_neural + α·P_ngram where α is a fixed function of model entropy × cubric multipliers — target-independent, committed before scoring
Cubric multipliers: adapt using beat-rate statistics from already-scored tokens (backward-looking only)
Warm-start values: derived from prior run's convergence, not from validation data — equivalent to a hyperparameter choice
No oracle selection: single committed mixture, no min-NLL comparison
GPTQ calibration: runs inside training wallclock
Committed distribution: proper mixture, all tokens have nonzero probability

Credits

Complementary training concept: @travispchen (PR Record: 0.4416 BPB -- Complementary Training + Backoff N-gram Mixer #803)
Shared n-gram table insight: @deanbrr (PR Record: BackoffNgramMixer + Drift-Free TTT (3-seed mean val_bpb=0.6683) #779)
N-gram eval cache: @deanbrr (PR Record: 5-gram Eval Cache + LeakyReLU² + Parallel Muon val_bpb: 1.0920 (3-seed mean, std 0.0007) | ~15.9 MB | 8×H100 SXM #659)
Multi-order backoff + adaptive alpha: @Asukabot0 (PR Record: First Legal Sub-1.0 BPB — Multi-order N-gram Backoff + Entropy-Adaptive Alpha (val_bpb=0.9674, 3-seed) #727)
3D Cubric pattern recognizer + warm-start: @newjordan (original)
Base architecture: @signalrush (PR Record: 11L EMA + GPTQ-lite + warmdown3500 + QAT@0.15 (val_bpb=1.1233) #414)

Reproduce

SEED=1337 NPROC_PER_NODE=8 bash concepts/xwing_yellow_III/run.sh

8xH100 SXM, 600s training + ~203s eval.

Test plan

Seed 1337: 0.4818 BPB
Seed 300: 0.4821 BPB
Seed 58: 0.4821 BPB
3-seed mean: 0.4820 BPB (std 0.0002)
All seeds under 16MB artifact limit
All seeds complete within 10 min training + 10 min eval

3D cubric pattern recognizer (54 warm-started adaptive multipliers) + complementary training. Seeds: 1337=0.4818, 300=0.4821, 58=0.4821. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Three variants targeting the 0.187 BPB gap to openai#1: - bwing_alpha: clip 0.95, alpha 0.05-0.60 (isolate alpha curve) - bwing_entropy_shift: per-order entropy center shift (isolate) - bwing_full_port: all openai#809 techniques + fixed order mults (fire first) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Cubric 3D back online (CADENCE=32, warm-start) - Per-order entropy center shift from openai#809 - Alpha 0.05-0.60, clip 0.95 - Our sliding-window TTT spliced in (1 epoch, SGD, freeze 2 blocks) - TTT runs BEFORE n-gram eval → adapted model feeds n-gram Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Port openai#809 LoRA TTT: rank-8 adapters on Q/V/LM head, AdamW, Polyak - Add LoRA injection to CausalSelfAttention, Block, GPT forward paths - 53s vs our old 410s TTT, 6x better BPB gain - Cubric 3D ON + entropy shift + alpha 0.05-0.60 clip 0.95 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fixed mults + entropy shift + alpha 0.05-0.60 clip 0.95 (no cubric). Base sliding: 1.1194, n-gram9: 0.4512. Delta from X-WING: -0.031. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Deleted LoRA TTT abomination. bwing_III is now a clean copy of our best scoring variant for further iteration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bwing_IV: Prime fix only — adds primes 283721, 347237 to eliminate XOR hash collisions for orders 8-9 (the 2.0x multiplier orders). With 7 primes, prime[7] wrapped to prime[0], causing context tokens at positions j-8 and j-1 to cancel when equal. bwing_V: Prime fix + cubric 3D stacked on top of fixed mults. Cubric warm-starts at 1.0 (neutral) and refines per (order × entropy × count) on top of the fixed order multiplier scaling. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adapted from old setup.sh. Fixes FA3 detection (old one skipped FA3 when FA2 was present), uses sp1024 dataset, adds zstandard install. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

garindean · 2026-03-26T06:50:08Z

are you calibrating gptq after the wallclock cap fires?

Octavian and others added 8 commits March 26, 2026 00:23

X-WING 3D Cubric: 0.4820 BPB (3-seed mean, std 0.0002)

4ce0d59

3D cubric pattern recognizer (54 warm-started adaptive multipliers) + complementary training. Seeds: 1337=0.4818, 300=0.4821, 58=0.4821. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Record bwing_full_port seed 1337: 0.4512 BPB

137432f

Fixed mults + entropy shift + alpha 0.05-0.60 clip 0.95 (no cubric). Base sliding: 1.1194, n-gram9: 0.4512. Delta from X-WING: -0.031. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace bwing_III with copy of SOTA bwing_full_port (0.4512 BPB)

94bb107

Deleted LoRA TTT abomination. bwing_III is now a clean copy of our best scoring variant for further iteration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Add B-wing pod setup script (FA3 + zstandard + sp1024)

3ebaf38

Adapted from old setup.sh. Fixes FA3 detection (old one skipped FA3 when FA2 was present), uses sp1024 dataset, adds zstandard install. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: X-WING 3D Cubric + Complementary Training (val_bpb=0.4820)#814

Record: X-WING 3D Cubric + Complementary Training (val_bpb=0.4820)#814
newjordan wants to merge 8 commits intoopenai:mainfrom
newjordan:submission/xwing-cubric3d

newjordan commented Mar 26, 2026 •

edited

Loading

Uh oh!

garindean commented Mar 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

newjordan commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Innovation: 3D Cubric Pattern Recognizer + Complementary Training

1. 3D Cubric (original)

2. Complementary Training (adapted from PR #803)

3. Shared N-gram Tables

Ablation

Legality

Credits

Reproduce

Test plan

Uh oh!

garindean commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

newjordan commented Mar 26, 2026 •

edited

Loading

garindean commented Mar 26, 2026 •

edited

Loading