Add benchmark runner script with retry functionality by AlexCuadron · Pull Request #8 · AlexCuadron/OpenHands

AlexCuadron · 2025-02-26T08:56:40Z

This PR adds a script to run the polyglot_benchmark and aider_bench benchmarks with retry functionality.

The script will:

Run the polyglot_benchmark with the specified parameters
Retry the benchmark if it fails, up to a maximum number of attempts
Run the aider_bench with the specified parameters
Retry the aider_bench if it fails, up to a maximum number of attempts
Provide a summary of the benchmark runs

The script is configurable with parameters for the model, agent, evaluation limit, number of workers, maximum retry attempts, and retry delay.

- Added update_llm_config_for_completions_logging to imports - Modified get_config to accept instance parameter - Updated llm_config to enable completions logging - Updated process_instance to pass instance to get_config This change makes aider_bench save llm_completions in the same way as swe_bench, with completions being saved in {eval_output_dir}/llm_completions/{instance_id}/

…tions-fork feat: Enable llm_completions logging in aider_bench

…lation

…execute_bash, finish, str_replace_editor)

…line options

…polyglot_benchmark

…path matching and debugging output

…an instance ID

Add polyglot benchmark implementation

github-actions · 2025-03-31T02:47:33Z

This PR is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

openhands-agent and others added 25 commits February 25, 2025 04:45

Merge pull request #4 from AlexCuadron/feature/aider-bench-llm-comple…

9315619

…tions-fork feat: Enable llm_completions logging in aider_bench

Merge remote-tracking branch 'origin/main'

dc59367

Add polyglot benchmark implementation

bc8f20d

Fix argument parser in polyglot benchmark

37ba696

Improve polyglot benchmark path handling and fix logging error

890377d

Add Docker configuration options and troubleshooting guide

8af6f11

Add local Docker image build support for polyglot benchmark

32335ff

Set Docker image to build automatically by default

5610010

Fix Docker build issues by adding unzip and simplifying Gradle instal…

c9e232e

…lation

Restrict polyglot benchmark to use only the same tools as SWE-Bench (…

97e7ca7

…execute_bash, finish, str_replace_editor)

Fix runtime completion to use Docker runtime for running tests

44bcb39

Add script to test one instance per language in polyglot benchmark

601da45

Add one-per-language testing mode to polyglot benchmark run_infer.sh

84293fd

Update README with one-per-language testing instructions and command-…

87d9e15

…line options

Enable LLM completions logging in aider_bench run_infer.py

8a5dc59

Include tools information in evaluation output directory names

8ffe33e

Add evaluation parameter to run_infer.sh scripts for aider_bench and …

d45b98d

…polyglot_benchmark

Update README files with documentation for the new evaluation parameter

62d2632

Fix output directory detection in evaluation scripts

c8dab2c

Fix LLM completions logging to ensure it's enabled in all benchmarks

fa9a0f8

Improve output directory detection in evaluation scripts with better …

8a4ca1e

…path matching and debugging output

Fix handling of 'eval' parameter to prevent it from being treated as …

a2d7e63

…an instance ID

Merge pull request #6 from AlexCuadron/polyglot-benchmark-clean

013ff2d

Add polyglot benchmark implementation

Add benchmark runner script with retry functionality

880bc10

AlexCuadron force-pushed the main branch from 013ff2d to 205a79b Compare February 28, 2025 23:52

github-actions bot added the Stale label Mar 31, 2025

AlexCuadron force-pushed the main branch from e76a772 to 9adfced Compare April 1, 2025 12:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmark runner script with retry functionality#8

Add benchmark runner script with retry functionality#8
AlexCuadron wants to merge 25 commits intomainfrom
benchmark-runner-script

AlexCuadron commented Feb 26, 2025

Uh oh!

github-actions bot commented Mar 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AlexCuadron commented Feb 26, 2025

Uh oh!

github-actions bot commented Mar 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants