-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Description
https://github.com/microsoft/mup
mup is a method of automatically scaling learning rate (per layer), over any size transformer.
Because our training and ablation runs are so expensive, this is naturally quite useful.
Metadata
Metadata
Assignees
Labels
No labels