Skip to content

Simplify CLI parameters for run-infer / run-eval. Avoid double defaults #344

@simonrosenberg

Description

@simonrosenberg

There are default values in evaluation repo for DATASET / DATASET_SPLIT / GAIA_LEVEL / etc ...
And then those values have defaults again in benchmarks repo.
This duplication causes errors since someone must change a value in two very different places to update a parameter.

  • the evaluation repository should not define any of those values. It should rely on the default values of the benchmarks repo for run-infer run-eval
  • all such values should be in a {benchmarks}/config.py
  • all such values should be saved in the artifacts for traceability

Also, SWTBench should have

  • DATASET_INFER = eth-sri/SWT-bench_Verified_bm25_27k_zsp
  • DATASET_EVAL = princeton-nlp/SWE-bench_Verified

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions