pip install novograd
When using NovoGrad, learning rate scheduler play an important role. Do not forget to set it.
Under Trained 3 epochs, same Architecture Neural Netwrok.
| Test Acc(%) | lr | lr scheduler | beta1 | beta2 | weight decay | |
|---|---|---|---|---|---|---|
| Momentum SGD | 96.92 | 0.01 | None | 0.9 | N/A | 0.001 |
| Adam | 96.72 | 0.001 | None | 0.9 | 0.999 | 0.001 |
| AdamW | 97.34 | 0.001 | None | 0.9 | 0.999 | 0.001 |
| NovoGrad | 97.55 | 0.01 | cosine | 0.95 | 0.98 | 0.001 |
Boris Ginsburg, Patrice Castonguay, Oleksii Hrinchuk, Oleksii Kuchaiev, Vitaly Lavrukhin, Ryan Leary, Jason Li, Huyen Nguyen, Jonathan M. Cohen, Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks, arXiv:1905.11286 [cs.LG], https://arxiv.org/pdf/1905.11286.pdf