You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Do the authors have any insight on whether nGPT quantizes better/more than standard GPT?
The faster convergence in FP16, along with all weights/activations being normalized, would seem to imply it most likely would be so.
Did the authors try this with any of their trained models?