Skip to content

Record: 0.6364 BPB - Depth Recurrence + Multi-Order N-gram Backoff#808

Open
Naazimsnh02 wants to merge 2 commits intoopenai:mainfrom
Naazimsnh02:ngram-depth-recurrence-0.6364
Open

Record: 0.6364 BPB - Depth Recurrence + Multi-Order N-gram Backoff#808
Naazimsnh02 wants to merge 2 commits intoopenai:mainfrom
Naazimsnh02:ngram-depth-recurrence-0.6364

Conversation

@Naazimsnh02
Copy link

Summary

val_bpb: 0.6364 (seed 1337) | ~15.95 MB | 8×H100 SXM | 2 seeds

Adds multi-order n-gram backoff (orders 2-7) with entropy-adaptive alpha to the depth recurrence stack, achieving a new record.

Key contributions

  • Multi-order n-gram backoff (orders 2-7): Hash-table n-gram counting at eval time. Highest-order match first, cascade down on miss. Zero training cost — purely eval-time.
  • Entropy-adaptive alpha: alpha = 0.05 + 0.55 * sigmoid(2 * (H − 4.0)) — trusts n-gram more when the neural model is uncertain, model when confident.
  • Multi-GPU n-gram prefill: Each rank pre-populates its hash tables with all tokens scored by earlier ranks, fixing the table fragmentation problem on multi-GPU setups (without this, 8-GPU gets 0.87 BPB instead of 0.64).
  • Depth Recurrence: Repeating layers 4,5 for 13 virtual layers from 11 physical at zero parameter cost (carried over from previous submission).

Results

Seed BPB
1337 0.6364
42 0.6382
Mean 0.6373

Built on PR #549 stack (LeakyReLU(0.5)², BigramHash(2048), XSA4, Partial RoPE, LN Scale, VE128, EMA+SWA, Parameter Banking + Parallel Muon, int6 GPTQ-lite + lzma).

Credits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant