Skip to content

submission/2026-03-25_WSD_CosineDecay_Schedule#791

Open
ShihChunHao wants to merge 1 commit intoopenai:mainfrom
ShihChunHao:submission/wsd-cosine-v2
Open

submission/2026-03-25_WSD_CosineDecay_Schedule#791
ShihChunHao wants to merge 1 commit intoopenai:mainfrom
ShihChunHao:submission/wsd-cosine-v2

Conversation

@ShihChunHao
Copy link

Summary

  • Replace linear warmdown LR schedule with Warmup-Stable-Decay (WSD) cosine schedule
    • 5% warmup → 75% stable at peak LR → 20% cosine decay
    • Prevents premature LR decay, especially critical under step-limited training budgets
  • Built on SOTA base (10L, MLP3x, SmearGate, BigramHash 10240, int5/int6, SWA 0.4, zstd-22)
  • No existing submission uses cosine decay or WSD schedule

Preliminary Results (1 GPU, seed=42)

Config val_bpb artifact_bytes
1 GPU, 600s, ~877 steps 1.2824 15,767,236

8xH100 3-seed results pending.

Run Command

torchrun --standalone --nproc_per_node=8 train_gpt.py

Replace linear warmdown LR schedule with Warmup-Stable-Decay (WSD):
5% warmup, 75% stable at peak LR, 20% cosine decay.
Built on SOTA base (10L, MLP3x, SmearGate, BigramHash, int5/int6, SWA, zstd-22).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant