You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your fantastic work! When I tried to reproduce the training result following the quick-start, I found the reward did not increase when using 7B model. The 14B model looked good, but reward collapse still occurred.