11L LeakyReLU² + XSA-all + Full GPTQ + 5-gram Backoff (1.0340 BPB) by xexyz · Pull Request #792 · openai/parameter-golf

xexyz · 2026-03-26T01:03:13Z

Summary

val_bpb: 1.0340 (3-seed mean)
11L/512d transformer with LeakyReLU(0.5)², XSA on all layers, Hessian-based GPTQ, and 5-gram multi-order backoff with entropy-adaptive alpha
Artifact: 15,903,061 bytes
Training: ~600s on 8xH100

3-Seed Validation

Seed	Sliding BPB	N-gram BPB
1337	1.1273	1.0342
42	1.1272	1.0340
7	1.1269	1.0338
Mean	1.1271	1.0340

Key Techniques

LeakyReLU(0.5)²: Replaces relu² with leaky variant (negative slope 0.5), better gradient flow
XSA-all: Cross-sequence attention extended from last 4 layers to all 11
Full GPTQ: Hessian-based int6 quantization with actorder + Cholesky error compensation, calibrated on training data (32 batches on EMA model, within training budget)
5-gram multi-order backoff: Score-first cascade 5→4→3→2-gram with separate hash tables per order (4M buckets each)
Entropy-adaptive alpha: alpha = 0.05 + 0.35 * sigmoid(2*(H-4.0)) — trusts n-gram more when model is uncertain

Reproduction

SEED=1337 GPTQ_CALIB_BATCHES=32 \
NGRAM_EVAL_ORDER=5 NGRAM_BACKOFF=1 NGRAM_ENTROPY_ADAPTIVE=1 \
NGRAM_ALPHA_LOW=0.05 NGRAM_ALPHA_HIGH=0.40 NGRAM_ENTROPY_THRESH=4.0 \
torchrun --nproc_per_node=8 train_gpt.py

Supersedes #691.

3-seed validated: 1337→1.0342, 42→1.0340, 7→1.0338 (mean 1.0340) Key techniques: - LeakyReLU(0.5)² activation - XSA on all 11 layers - Hessian-based GPTQ with Cholesky error compensation - 5-gram multi-order backoff with entropy-adaptive alpha

All three seeds produce consistent BPB: seed 1337: 1.0342 seed 42: 1.0340 seed 7: 1.0338 mean: 1.0340

xexyz mentioned this pull request Mar 26, 2026

11L EMA + GPTQ-lite + Legal Score-First TTT (1.1408 BPB) #691

Closed

Add 3-seed validation logs (seeds 1337, 42, 7)

c7a9540

All three seeds produce consistent BPB: seed 1337: 1.0342 seed 42: 1.0340 seed 7: 1.0338 mean: 1.0340

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

11L LeakyReLU² + XSA-all + Full GPTQ + 5-gram Backoff (1.0340 BPB)#792

11L LeakyReLU² + XSA-all + Full GPTQ + 5-gram Backoff (1.0340 BPB)#792
xexyz wants to merge 2 commits intoopenai:mainfrom
xexyz:xexyz/ngram-backoff-1.0340

xexyz commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xexyz commented Mar 26, 2026

Summary

3-Seed Validation

Key Techniques

Reproduction

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant