Skip to content

Conversation

@simonrosenberg
Copy link
Collaborator

Summary

This PR creates a single source of truth for all constant values used in the SWE-Bench Multimodal benchmark implementation by consolidating them into benchmarks/swebenchmultimodal/constants.py.

Fixes #367

Changes

New File

  • benchmarks/swebenchmultimodal/constants.py: Contains all hyperparameters and constant values for the SWE-Bench Multimodal benchmark

Updated Files

  • benchmarks/swebenchmultimodal/run_infer.py: Updated to import constants from constants.py
  • benchmarks/swebenchmultimodal/eval_infer.py: Updated to import constants from constants.py
  • benchmarks/swebenchmultimodal/build_images.py: Updated to import constants from constants.py
  • tests/test_swebenchmultimodal.py: Added comprehensive tests for all constants

Constants Consolidated

Category Constants
Dataset configuration DEFAULT_DATASET, DEFAULT_SPLIT
Docker image configuration DOCKER_IMAGE_PREFIX
Build configuration BUILD_TARGET
Workspace configuration WORKSPACE_DIR
Environment variable names ENV_SKIP_BUILD, ENV_RUNTIME_API_KEY, ENV_SDK_SHORT_SHA, ENV_REMOTE_RUNTIME_STARTUP_TIMEOUT, ENV_RUNTIME_API_URL
Default environment values DEFAULT_SKIP_BUILD, DEFAULT_REMOTE_RUNTIME_STARTUP_TIMEOUT, DEFAULT_RUNTIME_API_URL
Git configuration GIT_USER_EMAIL, GIT_USER_NAME, GIT_COMMIT_MESSAGE
Environment setup ENV_SETUP_COMMANDS
Image validation ALLOWED_IMAGE_TYPES
Evaluation configuration DEFAULT_EVAL_WORKERS, DEFAULT_MODEL_NAME
Annotation keywords SOLVEABLE_KEYWORD
Patch processing SETUP_FILES_TO_REMOVE
File paths ANNOTATIONS_FILENAME

Testing

All 43 tests pass, including 15 new tests specifically for the constants module:

  • Tests verify that all constants have the expected types and formats
  • Tests ensure constants are non-empty and valid for their intended use

Benefits

  1. Single source of truth: All hyperparameters are now in one place, making it easy to review and modify benchmark configuration
  2. Improved maintainability: Changes to constants only need to be made in one location
  3. Better documentation: The constants file serves as documentation for all configurable values
  4. Easier testing: Constants can be tested independently to ensure they have valid values

@simonrosenberg can click here to continue refining the PR

This commit creates a single source of truth for all constant values
used in the SWE-Bench Multimodal benchmark implementation.

Changes:
- Create benchmarks/swebenchmultimodal/constants.py with all constants
- Update run_infer.py to import constants from constants.py
- Update eval_infer.py to import constants from constants.py
- Update build_images.py to import constants from constants.py
- Add comprehensive tests for all constants

Constants consolidated:
- Dataset configuration (DEFAULT_DATASET, DEFAULT_SPLIT)
- Docker image configuration (DOCKER_IMAGE_PREFIX)
- Build configuration (BUILD_TARGET)
- Workspace configuration (WORKSPACE_DIR)
- Environment variable names and defaults
- Git configuration (GIT_USER_EMAIL, GIT_USER_NAME, GIT_COMMIT_MESSAGE)
- Environment setup commands
- Image validation types
- Evaluation configuration (DEFAULT_EVAL_WORKERS, DEFAULT_MODEL_NAME)
- Annotation keywords (SOLVEABLE_KEYWORD)
- Files to remove from patches (SETUP_FILES_TO_REMOVE)

Fixes #367

Co-authored-by: openhands <openhands@all-hands.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regroup all swebenchmultimodal hyper parameters in a single source of truth benchmarks/swebenchmultimodal/constants.py

3 participants