-
Notifications
You must be signed in to change notification settings - Fork 31
refactor(gaia): regroup all hyperparameters in constants.py #371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
d5476db to
ab84b31
Compare
This commit consolidates key constant values used in the GAIA benchmark implementation into a single constants.py file. Changes: - Create benchmarks/gaia/constants.py with essential constants: - Dataset configuration (DATASET_NAME, DATASET_YEAR, splits, levels) - Docker/image configuration (GAIA_BASE_IMAGE, TARGET_TYPE, IMAGE_TAG_PREFIX) - Runtime configuration (DEFAULT_RUNTIME_API_URL, DEFAULT_STARTUP_TIMEOUT) - Default values (model name, iterations, workers, etc.) - Update run_infer.py to import constants module and use constants - Update build_images.py to import constants module and use constants - Update eval_infer.py to import constants module and use constants Fixes #363 Co-authored-by: openhands <openhands@all-hands.dev>
ab84b31 to
fcd433e
Compare
|
@OpenHands /codereview-roasted |
|
I'm on it! simonrosenberg can track my progress at all-hands.dev |
Code Review Summary for PR #371: "refactor(gaia): regroup all hyperparameters in constants.py"🟡 Verdict: Acceptable - No behavioral changes, but incomplete execution✅ Behavioral Analysis: PASSAll constant values match their original hardcoded counterparts exactly:
No breaking changes detected.
|
|
@OpenHands Dead Code - These constants are defined but never used:DATASET_SPLIT_TEST, GAIA_LEVELS, DEFAULT_MAX_ITERATIONS, DEFAULT_CRITIC, DEFAULT_OUTPUT_DIR, DEFAULT_NUM_WORKERS, DEFAULT_MAX_ATTEMPTS If there is a mismatch with args_parser, the values should come from args_parser. Update constant values to match args parser Once you are done, update the PR description so it matches the current state of the code |
|
I'm on it! simonrosenberg can track my progress at all-hands.dev |
…faults Remove dead code constants that were never used: - DATASET_SPLIT_TEST (default split comes from args_parser) - GAIA_LEVELS (not used for validation) - DEFAULT_MAX_ITERATIONS (args_parser default: 100) - DEFAULT_CRITIC (args_parser default: 'pass') - DEFAULT_OUTPUT_DIR (args_parser default: './eval_outputs') - DEFAULT_NUM_WORKERS (args_parser default: 1) - DEFAULT_MAX_ATTEMPTS (args_parser default: 3) These constants were duplicating values already defined in benchmarks/utils/args_parser.py, which is the shared argument parser used by all benchmarks including GAIA. Co-authored-by: openhands <openhands@all-hands.dev>
SummaryI've addressed the dead code issue in PR #371 by removing the unused constants from Changes MadeRemoved unused constants:
Added documentation note explaining that CLI argument defaults are defined in Checklist
PR Link |
Summary
This PR consolidates all constant values and hyperparameters used in the GAIA benchmark implementation into a single
benchmarks/gaia/constants.pyfile, making it easier to review and modify benchmark configuration.Fixes #363
Changes
New File:
benchmarks/gaia/constants.pyCreated a centralized constants module with all constant values organized by category:
DATASET_NAME,DATASET_YEAR,DATASET_SPLIT_VALIDATIONGAIA_BASE_IMAGE,TARGET_TYPE,IMAGE_TAG_PREFIXDEFAULT_RUNTIME_API_URL,DEFAULT_STARTUP_TIMEOUTDEFAULT_MODEL_NAMENote: Default values for CLI arguments (max_iterations, num_workers, output_dir, max_attempts, critic, split) are defined in
benchmarks/utils/args_parser.py, which is the shared argument parser used by all benchmarks.Updated Files
run_infer.py: Updated to import and use constants fromconstants.pybuild_images.py: Updated to import and use constants fromconstants.pyeval_infer.py: Updated to import and use constants fromconstants.pyBenefits
Finaltype hints for better IDE support and type checkingargs_parser.pyto avoid duplication and potential mismatches