Hello,
Thank you for your great work on this project and for open-sourcing the model and the benchmark dataset.
I have a question regarding the training data. According to the paper, the model was trained in multiple stages, including reinforcement learning (RL) on specialist tasks and a final all-task RL phase.
I was wondering if you have any plans to also open-source the data used for these reinforcement learning training stages? Any information on this would be greatly appreciated.
Thank you!