Skip to content

What are the exact settings/arguments for reproducing the results of table 5 in the paper? #45

@lakshya-skyfall

Description

@lakshya-skyfall

I recently ran eval on some of the latest models since the paper came out and ran it with the dom_reward and a reward text model of 4o. I wasn't able to get nearly as good performance as the old 4o from table 5. How was table 5 numbers achieved? Can you provide the arguments for the evaluate.py file and settings to reproduce the same?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions