Regroup all multiswebench hyperparameters in constants.py #369

simonrosenberg · 2026-01-27T18:10:31Z

Summary

This PR creates a single source of truth for all Multi-SWE-Bench constant values and hyperparameters by introducing benchmarks/multiswebench/constants.py.

Fixes #366

Changes

1. Created `benchmarks/multiswebench/constants.py`

A new module containing all constant values organized into logical categories:

Dataset Configuration: DEFAULT_DATASET, DEFAULT_SPLIT, DEFAULT_LANGUAGE, DEFAULT_MODEL_NAME, DEFAULT_VERSION
Docker/Image Configuration: DEFAULT_DOCKER_IMAGE_PREFIX, DEFAULT_BUILD_TARGET, and environment variable names
Runtime Configuration: DEFAULT_RUNTIME_API_URL, DEFAULT_STARTUP_TIMEOUT, boolean defaults, and environment variable names
Evaluation Configuration: DEFAULT_EVAL_MODE, DEFAULT_MAX_WORKERS, DEFAULT_LOG_LEVEL, FIX_PATCH_RUN_CMD, etc.
Path Configuration: DATASET_CACHE_DIR, DEFAULT_WORKING_DIR
Workspace Configuration: DEFAULT_ENV_SETUP_COMMANDS

2. Updated all multiswebench modules to import from constants.py

build_images.py
download_dataset.py
eval_infer.py
run_infer.py
scripts/data/data_change.py
scripts/eval/update_multi_swe_bench_config.py

3. Added comprehensive tests

Created tests/test_multiswebench_constants.py with 28 tests covering:

All constant values and their expected types
Verification that all modules properly import from constants.py
All constants are exportable from the module

Testing

All 28 new tests pass:

tests/test_multiswebench_constants.py::TestDatasetConstants::test_default_dataset PASSED
tests/test_multiswebench_constants.py::TestDatasetConstants::test_default_split PASSED
tests/test_multiswebench_constants.py::TestDatasetConstants::test_default_language PASSED
...
tests/test_multiswebench_constants.py::TestAllConstantsExported::test_all_constants_importable PASSED
======================== 28 passed ========================

Existing tests continue to pass:

tests/test_metrics.py::test_benchmark_metrics_collection[multiswebench-MultiSWEBenchEvaluation] PASSED

Benefits

Single source of truth: All hyperparameters are now defined in one place
Easy to audit: Reviewers can quickly verify all constant values are correctly set
Maintainability: Changes to default values only need to be made in one location
Documentation: Constants are organized with clear comments explaining their purpose

@simonrosenberg can click here to continue refining the PR

This commit creates a single source of truth for all Multi-SWE-Bench constant values and hyperparameters by: 1. Creating benchmarks/multiswebench/constants.py with all constants: - Dataset configuration (DEFAULT_DATASET, DEFAULT_SPLIT, DEFAULT_LANGUAGE, etc.) - Docker/Image configuration (DEFAULT_DOCKER_IMAGE_PREFIX, DEFAULT_BUILD_TARGET, etc.) - Runtime configuration (DEFAULT_RUNTIME_API_URL, DEFAULT_STARTUP_TIMEOUT, etc.) - Evaluation configuration (DEFAULT_EVAL_MODE, DEFAULT_MAX_WORKERS, etc.) - Path configuration (DATASET_CACHE_DIR, DEFAULT_WORKING_DIR, etc.) - Environment variable names for all configurable values 2. Updating all multiswebench modules to import from constants.py: - build_images.py - download_dataset.py - eval_infer.py - run_infer.py - scripts/data/data_change.py - scripts/eval/update_multi_swe_bench_config.py 3. Adding comprehensive tests in tests/test_multiswebench_constants.py Fixes #366 Co-authored-by: openhands <openhands@all-hands.dev>

…taset.py - Removed tests/test_multiswebench_constants.py - Reverted benchmarks/multiswebench/scripts/data/data_change.py to original - Reverted benchmarks/multiswebench/download_dataset.py to original - Removed DEFAULT_VERSION and DATASET_CACHE_DIR from constants.py Co-authored-by: openhands <openhands@all-hands.dev>

Replace individual constants (DEFAULT_EVAL_MODE, DEFAULT_FORCE_BUILD, etc.) with a single DEFAULT_EVAL_HARNESS_CONFIG dictionary that serves as a template for the Multi-SWE-Bench evaluation harness configuration. This is cleaner because: - All config values are used together in one place - The complete config structure is visible at a glance - Single import instead of 10 individual imports - Easier to maintain Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai bot mentioned this pull request Jan 27, 2026

Regroup all multiswebench hyper parameters in a single source of truth benchmarks/multiswebench/constants.py #366

Open

openhands-agent added 2 commits January 28, 2026 10:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regroup all multiswebench hyperparameters in constants.py #369

Regroup all multiswebench hyperparameters in constants.py #369

simonrosenberg commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Regroup all multiswebench hyperparameters in constants.py #369

Are you sure you want to change the base?

Regroup all multiswebench hyperparameters in constants.py #369

Conversation

simonrosenberg commented Jan 27, 2026

Summary

Changes

1. Created benchmarks/multiswebench/constants.py

2. Updated all multiswebench modules to import from constants.py

3. Added comprehensive tests

Testing

Benefits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

1. Created `benchmarks/multiswebench/constants.py`