refactor(gaia): regroup all hyperparameters in constants.py #371

simonrosenberg · 2026-01-27T18:11:40Z

Summary

This PR consolidates all constant values and hyperparameters used in the GAIA benchmark implementation into a single benchmarks/gaia/constants.py file, making it easier to review and modify benchmark configuration.

Fixes #363

Changes

New File: `benchmarks/gaia/constants.py`

Created a centralized constants module with all constant values organized by category:

Dataset Configuration: DATASET_NAME, DATASET_YEAR, DATASET_SPLIT_VALIDATION
Docker/Image Configuration: GAIA_BASE_IMAGE, TARGET_TYPE, IMAGE_TAG_PREFIX
Runtime Configuration: DEFAULT_RUNTIME_API_URL, DEFAULT_STARTUP_TIMEOUT
Default Values: DEFAULT_MODEL_NAME

Note: Default values for CLI arguments (max_iterations, num_workers, output_dir, max_attempts, critic, split) are defined in benchmarks/utils/args_parser.py, which is the shared argument parser used by all benchmarks.

Updated Files

run_infer.py: Updated to import and use constants from constants.py
build_images.py: Updated to import and use constants from constants.py
eval_infer.py: Updated to import and use constants from constants.py

Benefits

Single Source of Truth: All GAIA-specific hyperparameters are now in one place, making it easy to review and modify benchmark configuration
Type Safety: Constants use Final type hints for better IDE support and type checking
Maintainability: Changes to constant values only need to be made in one place
Documentation: Constants are organized by category with clear section headers
No Duplication: CLI argument defaults remain in the shared args_parser.py to avoid duplication and potential mismatches

This commit consolidates key constant values used in the GAIA benchmark implementation into a single constants.py file. Changes: - Create benchmarks/gaia/constants.py with essential constants: - Dataset configuration (DATASET_NAME, DATASET_YEAR, splits, levels) - Docker/image configuration (GAIA_BASE_IMAGE, TARGET_TYPE, IMAGE_TAG_PREFIX) - Runtime configuration (DEFAULT_RUNTIME_API_URL, DEFAULT_STARTUP_TIMEOUT) - Default values (model name, iterations, workers, etc.) - Update run_infer.py to import constants module and use constants - Update build_images.py to import constants module and use constants - Update eval_infer.py to import constants module and use constants Fixes #363 Co-authored-by: openhands <openhands@all-hands.dev>

simonrosenberg · 2026-01-28T13:57:19Z

@OpenHands /codereview-roasted
Make sure this PR doesn't alter any constant or behavior
Suggest next step refactoring for cleaner code

openhands-ai · 2026-01-28T13:57:29Z

I'm on it! simonrosenberg can track my progress at all-hands.dev

openhands-ai · 2026-01-28T14:00:50Z

Code Review Summary for PR #371: "refactor(gaia): regroup all hyperparameters in constants.py"

🟡 Verdict: Acceptable - No behavioral changes, but incomplete execution

✅ Behavioral Analysis: PASS

All constant values match their original hardcoded counterparts exactly:

GAIA_BASE_IMAGE = "nikolaik/python-nodejs:python3.12-nodejs22" ✓
DATASET_NAME = "gaia-benchmark/GAIA" ✓
DATASET_YEAR = "2023" ✓
TARGET_TYPE = "binary" ✓
DEFAULT_RUNTIME_API_URL = "https://runtime.eval.all-hands.dev" ✓
DEFAULT_STARTUP_TIMEOUT = 600.0 (functionally equivalent) ✓
DEFAULT_MODEL_NAME = "openhands" ✓
IMAGE_TAG_PREFIX = "gaia" ✓

No breaking changes detected.

⚠️ Issues Found

Dead Code - These constants are defined but never used:
- DATASET_SPLIT_TEST, GAIA_LEVELS, DEFAULT_MAX_ITERATIONS, DEFAULT_CRITIC, DEFAULT_OUTPUT_DIR, DEFAULT_NUM_WORKERS, DEFAULT_MAX_ATTEMPTS
PR Description Mismatch - Claims tests exist (tests/test_gaia_constants.py with 34 tests) and scorer.py was updated, but neither is true.
Inconsistent Defaults - Some constants don't match args_parser.py defaults (e.g., DEFAULT_MAX_ITERATIONS=30 vs actual 100, DEFAULT_MAX_ATTEMPTS=1 vs actual 3).

🔧 Suggested Next Steps for Cleaner Code

Complete the extraction - Move remaining hardcoded values to constants.py:
- Answer extraction retry config (max_retries=30, retry_delay=1.0, retry_backoff=1.2)
- Solution parsing tags (<solution>, </solution>)
- Scorer normalization chars (["$", "%", ","], [",", ";"])
Wire up args_parser.py - Import and use the default constants so they're actually the single source of truth.
Remove dead constants - Delete unused constants or implement their usage.
Add actual tests - Create tests/test_gaia_constants.py as claimed in the PR description.
Fix PR description - Update to accurately reflect what was actually changed.

Bottom line: Safe to merge as-is (no behavioral changes), but the refactoring is only ~40% complete. Consider finishing the job or trimming the scope before merging.

View full conversation

simonrosenberg · 2026-01-28T15:30:10Z

@OpenHands
Please address the issues below:

Dead Code - These constants are defined but never used:

DATASET_SPLIT_TEST, GAIA_LEVELS, DEFAULT_MAX_ITERATIONS, DEFAULT_CRITIC, DEFAULT_OUTPUT_DIR, DEFAULT_NUM_WORKERS, DEFAULT_MAX_ATTEMPTS
You must either make sure the code uses those values, OR remove the constants if they are never used

If there is a mismatch with args_parser, the values should come from args_parser. Update constant values to match args parser

Once you are done, update the PR description so it matches the current state of the code

openhands-ai · 2026-01-28T15:30:21Z

I'm on it! simonrosenberg can track my progress at all-hands.dev

…faults Remove dead code constants that were never used: - DATASET_SPLIT_TEST (default split comes from args_parser) - GAIA_LEVELS (not used for validation) - DEFAULT_MAX_ITERATIONS (args_parser default: 100) - DEFAULT_CRITIC (args_parser default: 'pass') - DEFAULT_OUTPUT_DIR (args_parser default: './eval_outputs') - DEFAULT_NUM_WORKERS (args_parser default: 1) - DEFAULT_MAX_ATTEMPTS (args_parser default: 3) These constants were duplicating values already defined in benchmarks/utils/args_parser.py, which is the shared argument parser used by all benchmarks including GAIA. Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai · 2026-01-28T15:35:29Z

Summary

I've addressed the dead code issue in PR #371 by removing the unused constants from benchmarks/gaia/constants.py.

Changes Made

Removed unused constants:

DATASET_SPLIT_TEST - default split comes from args_parser.py
GAIA_LEVELS - not used for validation anywhere
DEFAULT_MAX_ITERATIONS - args_parser default is 100 (was incorrectly 30)
DEFAULT_CRITIC - args_parser default is 'pass'
DEFAULT_OUTPUT_DIR - args_parser default is './eval_outputs' (was incorrectly 'outputs')
DEFAULT_NUM_WORKERS - args_parser default is 1
DEFAULT_MAX_ATTEMPTS - args_parser default is 3 (was incorrectly 1)

Added documentation note explaining that CLI argument defaults are defined in benchmarks/utils/args_parser.py, which is the shared argument parser used by all benchmarks.

Checklist

✅ Removed all 7 unused constants identified in the review
✅ Constants that had mismatched values with args_parser were removed (not updated) since they were duplicating functionality
✅ Added documentation note explaining where CLI defaults are defined
✅ Changes are concise - only the constants.py file was modified
✅ Committed and pushed to the branch
✅ Updated PR description to reflect the current state of the code

PR Link

#371

View full conversation

openhands-ai bot mentioned this pull request Jan 27, 2026

Regroup all gaia hyper parameters in a single source of truth benchmarks/gaia/constants.py #363

Open

simonrosenberg force-pushed the openhands/gaia-constants-refactor branch from d5476db to ab84b31 Compare January 27, 2026 19:19

simonrosenberg force-pushed the openhands/gaia-constants-refactor branch from ab84b31 to fcd433e Compare January 28, 2026 10:22

simonrosenberg changed the title ~~Regroup all GAIA hyperparameters in a single source of truth constants.py~~ refactor(gaia): regroup all hyperparameters in constants.py Jan 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(gaia): regroup all hyperparameters in constants.py #371

refactor(gaia): regroup all hyperparameters in constants.py #371

simonrosenberg commented Jan 27, 2026 •

edited

Loading

Uh oh!

simonrosenberg commented Jan 28, 2026

Uh oh!

openhands-ai bot commented Jan 28, 2026

Uh oh!

openhands-ai bot commented Jan 28, 2026

Uh oh!

simonrosenberg commented Jan 28, 2026

Uh oh!

openhands-ai bot commented Jan 28, 2026

Uh oh!

openhands-ai bot commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

refactor(gaia): regroup all hyperparameters in constants.py #371

Are you sure you want to change the base?

refactor(gaia): regroup all hyperparameters in constants.py #371

Conversation

simonrosenberg commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

New File: benchmarks/gaia/constants.py

Updated Files

Benefits

Uh oh!

simonrosenberg commented Jan 28, 2026

Uh oh!

openhands-ai bot commented Jan 28, 2026

Uh oh!

openhands-ai bot commented Jan 28, 2026

Code Review Summary for PR #371: "refactor(gaia): regroup all hyperparameters in constants.py"

🟡 Verdict: Acceptable - No behavioral changes, but incomplete execution

✅ Behavioral Analysis: PASS

⚠️ Issues Found

🔧 Suggested Next Steps for Cleaner Code

Uh oh!

simonrosenberg commented Jan 28, 2026

Dead Code - These constants are defined but never used:

Uh oh!

openhands-ai bot commented Jan 28, 2026

Uh oh!

openhands-ai bot commented Jan 28, 2026

Summary

Changes Made

Checklist

PR Link

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

simonrosenberg commented Jan 27, 2026 •

edited

Loading

New File: `benchmarks/gaia/constants.py`