I am trying to reproduce the Mimo-Embodied 7B on a bunch of tasks mentioned in the Mimo-Embodied technical report. While tens of results are marked with *, which are obtained based on your evaluation framework. I observed significant performance gaps between my own evaluation scripts and the reported ones. I have checked the appendix to find prompts while the examples did not show the complete prompt like system prompt and complete answer format. I was wondering if it is possible to provide your evaluation code or more details such as prompts, hyper-parameter settings, and data preprocessing methods ?
I am trying to reproduce the Mimo-Embodied 7B on a bunch of tasks mentioned in the Mimo-Embodied technical report. While tens of results are marked with *, which are obtained based on your evaluation framework. I observed significant performance gaps between my own evaluation scripts and the reported ones. I have checked the appendix to find prompts while the examples did not show the complete prompt like system prompt and complete answer format. I was wondering if it is possible to provide your evaluation code or more details such as prompts, hyper-parameter settings, and data preprocessing methods ?