Skip to content

11L LeakyReLU² + XSA-all + Full GPTQ + 5-gram Backoff (1.0340 BPB)#792

Open
xexyz wants to merge 2 commits intoopenai:mainfrom
xexyz:xexyz/ngram-backoff-1.0340
Open

11L LeakyReLU² + XSA-all + Full GPTQ + 5-gram Backoff (1.0340 BPB)#792
xexyz wants to merge 2 commits intoopenai:mainfrom
xexyz:xexyz/ngram-backoff-1.0340

Conversation

@xexyz
Copy link

@xexyz xexyz commented Mar 26, 2026

Summary

  • val_bpb: 1.0340 (3-seed mean)
  • 11L/512d transformer with LeakyReLU(0.5)², XSA on all layers, Hessian-based GPTQ, and 5-gram multi-order backoff with entropy-adaptive alpha
  • Artifact: 15,903,061 bytes
  • Training: ~600s on 8xH100

3-Seed Validation

Seed Sliding BPB N-gram BPB
1337 1.1273 1.0342
42 1.1272 1.0340
7 1.1269 1.0338
Mean 1.1271 1.0340

Key Techniques

  1. LeakyReLU(0.5)²: Replaces relu² with leaky variant (negative slope 0.5), better gradient flow
  2. XSA-all: Cross-sequence attention extended from last 4 layers to all 11
  3. Full GPTQ: Hessian-based int6 quantization with actorder + Cholesky error compensation, calibrated on training data (32 batches on EMA model, within training budget)
  4. 5-gram multi-order backoff: Score-first cascade 5→4→3→2-gram with separate hash tables per order (4M buckets each)
  5. Entropy-adaptive alpha: alpha = 0.05 + 0.35 * sigmoid(2*(H-4.0)) — trusts n-gram more when model is uncertain

Reproduction

SEED=1337 GPTQ_CALIB_BATCHES=32 \
NGRAM_EVAL_ORDER=5 NGRAM_BACKOFF=1 NGRAM_ENTROPY_ADAPTIVE=1 \
NGRAM_ALPHA_LOW=0.05 NGRAM_ALPHA_HIGH=0.40 NGRAM_ENTROPY_THRESH=4.0 \
torchrun --nproc_per_node=8 train_gpt.py

Supersedes #691.

3-seed validated: 1337→1.0342, 42→1.0340, 7→1.0338 (mean 1.0340)

Key techniques:
- LeakyReLU(0.5)² activation
- XSA on all 11 layers
- Hessian-based GPTQ with Cholesky error compensation
- 5-gram multi-order backoff with entropy-adaptive alpha
All three seeds produce consistent BPB:
  seed 1337: 1.0342
  seed 42:   1.0340
  seed 7:    1.0338
  mean:      1.0340
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant