Performance gap in Reproduction and ask for Evaluation Code or Details

I am trying to reproduce the Mimo-Embodied 7B on a bunch of tasks mentioned in the Mimo-Embodied technical report. While tens of results are marked with *, which are obtained based on your evaluation framework. I observed significant performance gaps between my own evaluation scripts and the reported ones. I have checked the appendix to find prompts while the examples did not show the complete prompt like system prompt and complete answer format. I was wondering if it is possible to provide your evaluation code or more details such as prompts, hyper-parameter settings, and data preprocessing methods ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance gap in Reproduction and ask for Evaluation Code or Details #8

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance gap in Reproduction and ask for Evaluation Code or Details #8

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions