Skip to content

Record: Backoff N-gram Cache + LeakyReLU(0.9)² (val_bpb=0.6678)#806

Open
ibarrajo wants to merge 1 commit intoopenai:mainfrom
ibarrajo:submission/ngram-cache-0.6678
Open

Record: Backoff N-gram Cache + LeakyReLU(0.9)² (val_bpb=0.6678)#806
ibarrajo wants to merge 1 commit intoopenai:mainfrom
ibarrajo:submission/ngram-cache-0.6678

Conversation

@ibarrajo
Copy link

Summary

  • val_bpb: 0.6678 (seed 1337, additional seeds pending)
  • Multi-order backoff n-gram eval cache (orders 2-7) with entropy-adaptive alpha mixing
  • Distributed cache pre-fill for multi-GPU coherence (rank 7 pre-fills 54M tokens in 68s)
  • LeakyReLU(0.9)² activation (~0.013 BPB improvement over relu²)
  • Neural base: 1.1371 BPB (sliding window), n-gram cache: 0.6678 BPB
  • Artifact: 8.6MB (well under 16MB limit)
  • 8xH100 SXM, 7189 steps in 600s, eval in 200s

Key implementation details

  • Score-first legality: Every token scored under inference_mode() BEFORE cache update
  • Entropy-adaptive alpha: 0.05 + 0.55 * sigmoid(2*(H-4)) — no oracle/hindsight selection
  • Pre-fill: Each GPU rank pre-populates cache with all preceding tokens (pure numpy, no NCCL)
  • No pre-eval TTT — removed illegal pre-eval adaptation entirely

Results

Eval Method val_bpb
Non-overlapping (post-quant) 1.1594
Sliding window (stride=64) 1.1371
N-gram cache (orders 2-7) 0.6678

Test plan

  • Validated on 1xH100 (0.8556 BPB with undertrained model)
  • Full run on 8xH100 SXM (0.6678 BPB)
  • 2 additional seeds for statistical significance
  • Verify reproducibility from records/ folder

🤖 Generated with Claude Code

Multi-order backoff n-gram eval cache (orders 2-7) with entropy-adaptive
alpha mixing and distributed cache pre-fill for multi-GPU coherence.
Neural base 1.1371 BPB, n-gram cache drops to 0.6678. 8xH100 SXM,
7189 steps in 600s. Single seed (1337), additional seeds pending.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant