refactor(swebench): regroup all hyperparameters in constants.py #373

simonrosenberg · 2026-01-27T18:12:49Z

Summary

This PR creates a new constants.py module in benchmarks/swebench that serves as the single source of truth for all constant values used in the SWE-Bench evaluation workflow.

Fixes #348

Changes

New File: `benchmarks/swebench/constants.py`

Created a centralized constants module containing:

Dataset Configuration: DEFAULT_DATASET
Docker Image Configuration: DOCKER_IMAGE_PREFIX, DOCKER_IMAGE_TAG, WRAPPED_REPOS
Build Configuration: BUILD_TARGET_SOURCE_MINIMAL, BUILD_TARGET_BINARY, DEFAULT_BUILD_TARGET
Runtime Configuration: DEFAULT_RUNTIME_API_URL, DEFAULT_REMOTE_RUNTIME_STARTUP_TIMEOUT
Evaluation Configuration: DEFAULT_EVAL_WORKERS
Model Configuration: DEFAULT_MODEL_NAME, DEFAULT_CLI_MODEL_NAME
Git Configuration: GIT_USER_EMAIL, GIT_USER_NAME, GIT_COMMIT_MESSAGE
Patch Processing: SETUP_FILES_TO_REMOVE

Updated Files

run_infer.py: Updated to import and use constants from constants.py
eval_infer.py: Updated to import and use constants from constants.py
build_images.py: Updated to import and use constants from constants.py

Behavior Preservation

All original default values are preserved exactly:

DEFAULT_MODEL_NAME = "OpenHands" (matches original function default)
DEFAULT_CLI_MODEL_NAME = "openhands" (matches original CLI argparse default)
DEFAULT_DATASET = "princeton-nlp/SWE-bench_Verified"
DOCKER_IMAGE_PREFIX = "docker.io/swebench/"
DOCKER_IMAGE_TAG = "latest"
DEFAULT_REMOTE_RUNTIME_STARTUP_TIMEOUT = 600 (int type for proper semantics)
DEFAULT_EVAL_WORKERS = 12 (int type, converted to str at argparse usage)
All git configuration and patch processing values unchanged

Code Quality Improvements

Type Safety: All constants use typing.Final annotations
Immutability: WRAPPED_REPOS uses frozenset, SETUP_FILES_TO_REMOVE uses tuple
Proper Types: DEFAULT_EVAL_WORKERS is now int (with type=int in argparse)
Clean Headers: Simplified section comments for better readability

Benefits

Single Source of Truth: All hyperparameters are now defined in one place, making it easy to review and modify them
Improved Maintainability: Changes to constant values only need to be made in one location
Better Discoverability: Developers can easily find all configurable parameters in one file
Type Safety: Final annotations prevent accidental reassignment
Immutable Collections: Using frozenset and tuple prevents accidental mutation

…tants.py This commit creates a new constants.py module in benchmarks/swebench that serves as the single source of truth for all constant values used in the SWE-Bench evaluation workflow. Changes: - Create benchmarks/swebench/constants.py with all constant values: - Dataset configuration (DEFAULT_DATASET, DEFAULT_SPLIT) - Docker image configuration (DOCKER_IMAGE_PREFIX, DOCKER_IMAGE_TAG) - Build configuration (BUILD_TARGET_*, DEFAULT_BUILD_TARGET) - Runtime configuration (DEFAULT_RUNTIME_API_URL, DEFAULT_REMOTE_RUNTIME_STARTUP_TIMEOUT) - Evaluation configuration (DEFAULT_MAX_ITERATIONS, DEFAULT_NUM_WORKERS, etc.) - Model configuration (DEFAULT_MODEL_NAME) - Git configuration (GIT_USER_EMAIL, GIT_USER_NAME, GIT_COMMIT_MESSAGE) - Patch processing (SETUP_FILES_TO_REMOVE) - Update run_infer.py, eval_infer.py, and build_images.py to import and use constants from the constants module Fixes #348 Co-authored-by: openhands <openhands@all-hands.dev>

- Fix DEFAULT_MODEL_NAME case: 'openhands' -> 'OpenHands' to match original function default - Move WRAPPED_REPOS from build_images.py to constants.py - Remove unused constants: DEFAULT_SPLIT, DEFAULT_MAX_ITERATIONS, DEFAULT_NUM_WORKERS, DEFAULT_MAX_ATTEMPTS, DEFAULT_MAX_RETRIES, DEFAULT_N_LIMIT, DEFAULT_NOTE, DEFAULT_OUTPUT_DIR - Fix type: DEFAULT_REMOTE_RUNTIME_STARTUP_TIMEOUT is now int (600) instead of string Co-authored-by: openhands <openhands@all-hands.dev>

simonrosenberg · 2026-01-28T14:40:52Z

@OpenHands
/codereview-roasted
Make sure this PR doesn't alter any constant or behavior
Suggest next step refactoring for cleaner code

openhands-ai · 2026-01-28T14:41:02Z

I'm on it! simonrosenberg can track my progress at all-hands.dev

openhands-ai · 2026-01-28T14:43:37Z

Code Review Summary for PR #373

I performed a critical code review of the PR "refactor(swebench): regroup all hyperparameters in constants.py" as requested with the /codereview-roasted command.

🔴 Verdict: Needs Rework

Critical Finding: Behavior-Altering Change Detected

The PR claims "Behavior Preservation" but actually changes CLI behavior:

Location	Original Value	PR Value
`eval_infer.py` argparse `--model-name` default	`"openhands"` (lowercase)	`"OpenHands"` (mixed case)

This silently changes behavior for users running swebench-eval output.jsonl without specifying --model-name.

Additional Issues Identified:

Mutable constants - WRAPPED_REPOS (set) and SETUP_FILES_TO_REMOVE (list) should be immutable (frozenset and tuple)
Type inconsistency - DEFAULT_EVAL_WORKERS = "12" is a string but semantically an integer
Verbose formatting - Section headers with ===== are excessive for 54 lines

Suggested Next Steps for Cleaner Code:

Fix the breaking change - Either add DEFAULT_CLI_MODEL_NAME = "openhands" or change DEFAULT_MODEL_NAME to lowercase
Use immutable types - frozenset for sets, tuple for lists
Consider typing.Final for explicit constant declaration
Group related constants - Consider NamedTuple or @dataclass(frozen=True) for related groups like git config

No code changes were made - this was a review-only request. The feedback has been provided above for the PR author to address.

View full conversation

…tion - Fix breaking change: add DEFAULT_CLI_MODEL_NAME to preserve original CLI default ('openhands') - Use typing.Final for all constants to indicate immutability - Use frozenset for WRAPPED_REPOS (immutable) - Use tuple for SETUP_FILES_TO_REMOVE (immutable) - Change DEFAULT_EVAL_WORKERS to int type with proper conversion at usage sites - Simplify section headers for cleaner code - Add type=int to --workers argparse argument for proper type handling Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai · 2026-01-28T14:57:26Z

Looks like there are a few issues preventing this PR from being merged!

GitHub Actions are failing:
- Pre-commit checks

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #373 at branch `openhands/swebench-constants-348`

Feel free to include any additional details that might help me get this PR into a better state.

_{^{You can manage your notification settings}}

simonrosenberg · 2026-01-28T15:14:01Z

@OpenHands fix the precommit errors

openhands-ai · 2026-01-28T15:14:11Z

I'm on it! simonrosenberg can track my progress at all-hands.dev

The pyright type checker was failing because DEFAULT_BUILD_TARGET was typed as Final[str] but build_image() expects a TargetType which is Literal['binary', 'binary-minimal', 'source', 'source-minimal']. This fix adds a local TargetType alias and properly types the build target constants to match the expected type signature.

openhands-ai · 2026-01-28T15:18:18Z

I've fixed the pre-commit errors in PR #373. The issue was a pyright type checking failure.

Problem: The DEFAULT_BUILD_TARGET constant in constants.py was typed as Final[str], but the build_image() function in build_utils.py expects a TargetType parameter which is Literal['binary', 'binary-minimal', 'source', 'source-minimal']. This caused pyright to fail with:

Argument of type "str" cannot be assigned to parameter "target" of type "TargetType"

Fix: Updated constants.py to:

Import Literal from typing
Define a local TargetType alias matching the expected literal type
Type the build target constants (BUILD_TARGET_SOURCE_MINIMAL, BUILD_TARGET_BINARY, DEFAULT_BUILD_TARGET) as Final[TargetType] instead of Final[str]

Changes pushed: The fix has been committed and pushed to the openhands/swebench-constants-348 branch. All pre-commit checks now pass:

✅ Ruff format
✅ Ruff lint
✅ PEP8 style check (pycodestyle)
✅ Type check with Pyright (strict)

View full conversation

simonrosenberg · 2026-01-28T16:20:13Z

@OpenHands
/codereview-roasted

The goal of this PR is to centralize all hyperparameters and constant values that are necessary to run a benchmark in a clear constant.py file.

Problem: get_parser is used across benchmarks. All its arguments are shared across benchmarks except dataset that is benchmark specific.
Question: how to structure the code so that

the value "princeton-nlp/SWE-bench_Verified" is only written in a single swebench/constants.py (and same for other benchmark dataset values)
Each benchmark should have its own default dataset value so that when running it from CLI there is no need to input the correct dataset value

Perhaps we could not have any defaults on the utils/args_parser.py but then in each specific benchmark run-infer, we input the correct dataset?

openhands-ai · 2026-01-28T16:20:24Z

I'm on it! simonrosenberg can track my progress at all-hands.dev

openhands-ai · 2026-01-28T16:23:23Z

Code Review for PR #373: "refactor(swebench): regroup all hyperparameters in constants.py"

🟡 Taste Rating: Acceptable - Works but has an architectural gap

[CRITICAL ISSUES] (Must address to achieve stated goal)

[benchmarks/utils/args_parser.py, Line 26-27] Data Duplication: The PR claims to centralize constants, but "princeton-nlp/SWE-bench_Verified" is still hardcoded in the shared args_parser.py:

default="princeton-nlp/SWE-bench_Verified",

This violates the stated goal of having dataset values "only written in a single constants.py file."

[IMPROVEMENT OPPORTUNITIES] (Recommended architecture)

Your intuition is correct. Here's the recommended approach:

1. Remove the default from benchmarks/utils/args_parser.py:

parser.add_argument(
    "--dataset",
    type=str,
    default=None,  # No default - each benchmark sets its own
    help="Dataset name (required unless benchmark provides default)",
)

2. Each benchmark sets its own default in run_infer.py:

# In benchmarks/swebench/run_infer.py
from benchmarks.swebench import constants
parser = get_parser()
parser.set_defaults(dataset=constants.DEFAULT_DATASET)

# In benchmarks/swebenchmultimodal/run_infer.py
from benchmarks.swebenchmultimodal import constants  # create this
parser = get_parser()
parser.set_defaults(dataset=constants.DEFAULT_DATASET)

3. Create constants.py for each benchmark that needs one:

benchmarks/swebench/constants.py ✅ (already done)
benchmarks/swebenchmultimodal/constants.py → DEFAULT_DATASET = "princeton-nlp/SWE-bench_Multimodal"
benchmarks/commit0/constants.py → DEFAULT_DATASET = "wentingzhao/commit0_combined"
etc.

This pattern is already partially used in swebenchmultimodal/run_infer.py and commit0/run_infer.py with parser.set_defaults(), but they hardcode the value instead of importing from constants.

[STYLE NOTES]

[benchmarks/swebench/constants.py] Good use of Final type hints and frozenset/tuple for immutability. The TargetType literal definition is clean.

VERDICT:

✅ Worth merging as-is for swebench-specific improvements, but...

❌ Incomplete for stated goal - The PR doesn't fully achieve "dataset value only written once" because args_parser.py still has the hardcoded default.

KEY INSIGHT:

The current PR is a good first step for swebench, but to truly centralize constants across all benchmarks, you need to:

Remove the default from the shared parser
Have each benchmark's run_infer.py call parser.set_defaults(dataset=<benchmark>.constants.DEFAULT_DATASET)

This matches your suggested approach: "Perhaps we could not have any defaults on the utils/args_parser.py but then in each specific benchmark run-infer, we input the correct dataset?"

Recommended Next Steps:

Merge this PR as-is (swebench constants are properly centralized)
Create a follow-up PR that:
- Removes the default from args_parser.py
- Updates each benchmark's run_infer.py to set its own default
- Creates constants.py files for other benchmarks as needed

View full conversation

- Remove hardcoded dataset default from args_parser.py (now None) - Each benchmark sets its own dataset default via parser.set_defaults() - Add shared constants to utils/constants.py: - DEFAULT_WORKSPACE='remote' (behavior change from 'docker') - DEFAULT_SPLIT, DEFAULT_MAX_ITERATIONS, DEFAULT_NUM_EVAL_WORKERS - DEFAULT_OUTPUT_DIR, DEFAULT_MAX_ATTEMPTS, DEFAULT_MAX_RETRIES - DEFAULT_NOTE, DEFAULT_N_LIMIT, DEFAULT_CRITIC - Update args_parser.py to use constants for all defaults - Update critics.py to use DEFAULT_CRITIC constant - Update swebench/run_infer.py to set its own dataset default Co-authored-by: openhands <openhands@all-hands.dev>

….py" This reverts commit 214c3ab.

openhands-ai bot mentioned this pull request Jan 27, 2026

Regroup all swebench hyper parameters in a single source of truth benchmarks/swebench/constants.py #348

Closed

simonrosenberg force-pushed the openhands/swebench-constants-348 branch 2 times, most recently from 5eecddf to b0ab705 Compare January 27, 2026 18:44

simonrosenberg requested review from juanmichelini and neubig January 27, 2026 19:01

simonrosenberg force-pushed the openhands/swebench-constants-348 branch from b0ab705 to 545be81 Compare January 27, 2026 19:01

neubig approved these changes Jan 27, 2026

View reviewed changes

simonrosenberg changed the title ~~Regroup all swebench hyperparameters in a single source of truth constants.py~~ refactor(swebench): regroup all hyperparameters in constants.py Jan 28, 2026

OpenHands deleted a comment from openhands-ai bot Jan 28, 2026

openhands-agent added 2 commits January 28, 2026 16:42

Revert "refactor: centralize CLI argument defaults in utils/constants…

76069a9

….py" This reverts commit 214c3ab.

simonrosenberg merged commit 2e8638d into main Jan 28, 2026
2 checks passed

simonrosenberg deleted the openhands/swebench-constants-348 branch January 28, 2026 17:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(swebench): regroup all hyperparameters in constants.py #373

refactor(swebench): regroup all hyperparameters in constants.py #373

Uh oh!

simonrosenberg commented Jan 27, 2026 •

edited

Loading

Uh oh!

simonrosenberg commented Jan 28, 2026

Uh oh!

openhands-ai bot commented Jan 28, 2026

Uh oh!

openhands-ai bot commented Jan 28, 2026

Uh oh!

openhands-ai bot commented Jan 28, 2026

Uh oh!

simonrosenberg commented Jan 28, 2026

Uh oh!

openhands-ai bot commented Jan 28, 2026

Uh oh!

openhands-ai bot commented Jan 28, 2026

Uh oh!

simonrosenberg commented Jan 28, 2026

Uh oh!

openhands-ai bot commented Jan 28, 2026

Uh oh!

openhands-ai bot commented Jan 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

refactor(swebench): regroup all hyperparameters in constants.py #373

refactor(swebench): regroup all hyperparameters in constants.py #373

Uh oh!

Conversation

simonrosenberg commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

New File: benchmarks/swebench/constants.py

Updated Files

Behavior Preservation

Code Quality Improvements

Benefits

Uh oh!

simonrosenberg commented Jan 28, 2026

Uh oh!

openhands-ai bot commented Jan 28, 2026

Uh oh!

openhands-ai bot commented Jan 28, 2026

Code Review Summary for PR #373

🔴 Verdict: Needs Rework

Critical Finding: Behavior-Altering Change Detected

Additional Issues Identified:

Suggested Next Steps for Cleaner Code:

Uh oh!

openhands-ai bot commented Jan 28, 2026

Uh oh!

simonrosenberg commented Jan 28, 2026

Uh oh!

openhands-ai bot commented Jan 28, 2026

Uh oh!

openhands-ai bot commented Jan 28, 2026

Uh oh!

simonrosenberg commented Jan 28, 2026

Uh oh!

openhands-ai bot commented Jan 28, 2026

Uh oh!

openhands-ai bot commented Jan 28, 2026

Code Review for PR #373: "refactor(swebench): regroup all hyperparameters in constants.py"

🟡 Taste Rating: Acceptable - Works but has an architectural gap

[CRITICAL ISSUES] (Must address to achieve stated goal)

[IMPROVEMENT OPPORTUNITIES] (Recommended architecture)

[STYLE NOTES]

VERDICT:

KEY INSIGHT:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

simonrosenberg commented Jan 27, 2026 •

edited

Loading

New File: `benchmarks/swebench/constants.py`