-
Notifications
You must be signed in to change notification settings - Fork 30
Pull requests: OpenHands/benchmarks
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Align default argument values with evaluation repository
#377
opened Jan 28, 2026 by
simonrosenberg
•
Draft
Require explicit model_name for SWE-Bench converters
#376
opened Jan 28, 2026 by
simonrosenberg
Loading…
refactor(swtbench): regroup all hyperparameters in constants.py
#372
opened Jan 27, 2026 by
simonrosenberg
Loading…
refactor(gaia): regroup all hyperparameters in constants.py
#371
opened Jan 27, 2026 by
simonrosenberg
Loading…
refactor(commit0): consolidate hyperparameters in constants.py
#370
opened Jan 27, 2026 by
simonrosenberg
Loading…
Regroup all multiswebench hyperparameters in constants.py
#369
opened Jan 27, 2026 by
simonrosenberg
Loading…
Regroup all swebenchmultimodal hyperparameters in constants.py
openhands
#368
opened Jan 27, 2026 by
simonrosenberg
Loading…
Add SDK commit hash to workflow run titles for index benchmarks
#351
opened Jan 21, 2026 by
juanmichelini
Loading…
build(deps): bump the version-all group across 1 directory with 15 updates
dependencies
Pull requests that update a dependency file
python:uv
Pull requests that update python:uv code
#336
opened Jan 17, 2026 by
dependabot
bot
Loading…
fix(swebench-multimodal): create output.report.json for consistency
#331
opened Jan 16, 2026 by
juanmichelini
Loading…
BREAKING: Rename --max-attempts to --n-critic-runs
#325
opened Jan 16, 2026 by
juanmichelini
•
Draft
Fix dataset loading schema validation issue in CI
#304
opened Jan 13, 2026 by
juanmichelini
Loading…
build(deps): bump actions/github-script from 7 to 8 in the version-all group
dependencies
Pull requests that update a dependency file
github_actions
Pull requests that update GitHub Actions code
#292
opened Jan 12, 2026 by
dependabot
bot
Loading…
Add add_resolve_rate_to_predictions function to output_utils
#199
opened Dec 23, 2025 by
juanmichelini
•
Draft
API-based Critic implementation
build-swebench-200
Build 200 SWE-Bench Verified Image based on SDK version on this PR.
ProTip!
Follow long discussions with comments:>50.