Skip to content

Performance gap in Reproduction and ask for Evaluation Code or Details #8

@Yuanjiang-Cao

Description

@Yuanjiang-Cao

I am trying to reproduce the Mimo-Embodied 7B on a bunch of tasks mentioned in the Mimo-Embodied technical report. While tens of results are marked with *, which are obtained based on your evaluation framework. I observed significant performance gaps between my own evaluation scripts and the reported ones. I have checked the appendix to find prompts while the examples did not show the complete prompt like system prompt and complete answer format. I was wondering if it is possible to provide your evaluation code or more details such as prompts, hyper-parameter settings, and data preprocessing methods ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions