Release training dataset and loss log to help reproduce results

I am trying to reproduce the Mistral-7B-SPPO Iter1 model.  However, after my first iteration, the model I trained diverged significantly from the published Mistral-7B-SPPO Iter1 model when comparing the results on a benchmark dataset.
To help with diagnosing the issue and improving my training, could you kindly provide the training dataset and the loss log so that I can compare my run to?  

I was using this prompt [dataset](https://huggingface.co/datasets/UCLA-AGI/data-mistral-7b-instruct-sppo-iter1?row=3), but it doesn't contains the columns `chosen, rejected, chosen_probs, chosen_probs_win, chosen_probs_lose`.  While running the generate.sh script can produce these columns, it would be incredibly helpful if the actual training dataset could be released. This would make it much easier for me to debug and identify where things might have gone wrong in my training process.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release training dataset and loss log to help reproduce results #26

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Release training dataset and loss log to help reproduce results #26

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions