You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for your sharing of the code. I have a question when I try to reproduce the experiment in the paper or more specifically, this figure.
I followed the code in this repo and set the parameters of 1B models according to the paper:
I also set the global batch size the same with the paper which is 512.
I use 8 H800 with 80GB. The Validation loss during training I got is as follow:
which seems very different from the figure in the paper. And I wonder whether I am doing something wrong or why is this. I hope to get your help. thanks a lot!