Add SDK commit hash to workflow run titles for index benchmarks #351

juanmichelini · 2026-01-21T19:49:20Z

Summary

This PR adds a dynamic run-name to the build images workflows for index benchmarks, so that the SDK commit hash is displayed in the workflow run title when provided.

Changes

Added run-name property to the following workflow files:

build-swebench-images.yml
build-swebenchmultimodal-images.yml
build-swtbench-images.yml
build-commit0-images.yml
build-gaia-image.yml

When triggered via workflow_dispatch with an sdk-commit input, the workflow run title will now include the SDK commit hash. For example:

Build SWE-Bench Images (SDK: abc1234)
Build GAIA Images (SDK: main)

This makes it easier to identify which SDK version was used for each build at a glance, as mentioned in the issue.

Related Issue

Fixes #350

@juanmichelini can click here to continue refining the PR

This adds a dynamic run-name to the build images workflows for: - SWE-Bench - SWE-Bench Multimodal - SWT-Bench - Commit0 - GAIA When triggered via workflow_dispatch with an sdk-commit input, the workflow run title will now include the SDK commit hash (e.g., 'Build SWE-Bench Images (SDK: abc1234)'). This makes it easier to identify which SDK version was used for each build at a glance. Fixes #350 Co-authored-by: openhands <openhands@all-hands.dev>

The commit0-lite benchmark contains 16 instances total, but only 10 are used as reference (gold) instances for accuracy calculation on the official leaderboard. Issue: PR #351 showed 100.7% accuracy (3652/3628) because we were including all 16 repos instead of just the 10 reference repos, leading to incorrect test totals and impossible >100% accuracy. The 10 reference repos are: - simpy - tinydb - marshmallow - wcwidth - imapclient - voluptuous - jinja - deprecated - cookiecutter - cachetools Changes: - run_infer.py: Filter dataset to only reference repos - eval_infer.py: Skip non-reference repos when calculating totals - Updated total_instances from 16 to 10 - Added comments with links to official leaderboard documentation References: - Leaderboard: https://commit-0.github.io/analysis/ - Breakdown: https://commit-0.github.io/analysis_commit0-lite-plain_fillin/

openhands-ai bot mentioned this pull request Jan 21, 2026

Build images action for index benchmarks should contain the sdk commit hash they used in the title #350

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SDK commit hash to workflow run titles for index benchmarks #351

Add SDK commit hash to workflow run titles for index benchmarks #351

Uh oh!

juanmichelini commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add SDK commit hash to workflow run titles for index benchmarks #351

Are you sure you want to change the base?

Add SDK commit hash to workflow run titles for index benchmarks #351

Uh oh!

Conversation

juanmichelini commented Jan 21, 2026

Summary

Changes

Related Issue

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants