Add polyglot benchmark implementation by AlexCuadron · Pull Request #6 · AlexCuadron/OpenHands

AlexCuadron · 2025-02-26T06:22:14Z

This PR adds the polyglot benchmark implementation based on the Aider-AI/polyglot-benchmark repository. The benchmark evaluates how effectively an agent can translate natural language coding requests into executable code that passes unit tests across multiple programming languages (Python, JavaScript, Rust, Go, C++, Java).

This is a clean version of PR #5 with consistent author information.

…lation

…execute_bash, finish, str_replace_editor)

…line options

…polyglot_benchmark

…path matching and debugging output

…an instance ID

AlexCuadron added 20 commits February 26, 2025 06:22

Add polyglot benchmark implementation

bc8f20d

Fix argument parser in polyglot benchmark

37ba696

Improve polyglot benchmark path handling and fix logging error

890377d

Add Docker configuration options and troubleshooting guide

8af6f11

Add local Docker image build support for polyglot benchmark

32335ff

Set Docker image to build automatically by default

5610010

Fix Docker build issues by adding unzip and simplifying Gradle instal…

c9e232e

…lation

Restrict polyglot benchmark to use only the same tools as SWE-Bench (…

97e7ca7

…execute_bash, finish, str_replace_editor)

Fix runtime completion to use Docker runtime for running tests

44bcb39

Add script to test one instance per language in polyglot benchmark

601da45

Add one-per-language testing mode to polyglot benchmark run_infer.sh

84293fd

Update README with one-per-language testing instructions and command-…

87d9e15

…line options

Enable LLM completions logging in aider_bench run_infer.py

8a5dc59

Include tools information in evaluation output directory names

8ffe33e

Add evaluation parameter to run_infer.sh scripts for aider_bench and …

d45b98d

…polyglot_benchmark

Update README files with documentation for the new evaluation parameter

62d2632

Fix output directory detection in evaluation scripts

c8dab2c

Fix LLM completions logging to ensure it's enabled in all benchmarks

fa9a0f8

Improve output directory detection in evaluation scripts with better …

8a4ca1e

…path matching and debugging output

Fix handling of 'eval' parameter to prevent it from being treated as …

a2d7e63

…an instance ID

AlexCuadron marked this pull request as ready for review February 26, 2025 08:20

AlexCuadron merged commit 013ff2d into main Feb 26, 2025
6 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add polyglot benchmark implementation#6

Add polyglot benchmark implementation#6
AlexCuadron merged 20 commits intomainfrom
polyglot-benchmark-clean

AlexCuadron commented Feb 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AlexCuadron commented Feb 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant