-
Notifications
You must be signed in to change notification settings - Fork 47
Description
I am trying to reproduce the Mistral-7B-SPPO Iter1 model. However, after my first iteration, the model I trained diverged significantly from the published Mistral-7B-SPPO Iter1 model when comparing the results on a benchmark dataset.
To help with diagnosing the issue and improving my training, could you kindly provide the training dataset and the loss log so that I can compare my run to?
I was using this prompt dataset, but it doesn't contains the columns chosen, rejected, chosen_probs, chosen_probs_win, chosen_probs_lose. While running the generate.sh script can produce these columns, it would be incredibly helpful if the actual training dataset could be released. This would make it much easier for me to debug and identify where things might have gone wrong in my training process.