Add polyglot benchmark implementation by AlexCuadron · Pull Request #5 · AlexCuadron/OpenHands

AlexCuadron · 2025-02-26T06:04:42Z

This PR adds the polyglot benchmark implementation based on the Aider-AI/polyglot-benchmark repository. The benchmark evaluates how effectively an agent can translate natural language coding requests into executable code that passes unit tests across multiple programming languages (Python, JavaScript, Rust, Go, C++, Java).

- Added update_llm_config_for_completions_logging to imports - Modified get_config to accept instance parameter - Updated llm_config to enable completions logging - Updated process_instance to pass instance to get_config This change makes aider_bench save llm_completions in the same way as swe_bench, with completions being saved in {eval_output_dir}/llm_completions/{instance_id}/

…tions-fork feat: Enable llm_completions logging in aider_bench

Added a new benchmark based on Aider's polyglot benchmark that supports: - Multiple programming languages (Python, JS, Rust, Go, C++, Java) - End-to-end evaluation of code editing capabilities - Automated test execution and validation - Parallel evaluation with multiple workers - Detailed metrics and logging Key components: - run_infer.py: Main benchmark implementation - Dockerfile: Multi-language development environment - Scripts for running benchmarks and building Docker image - Helper modules for prompts and utilities

Modified run_infer.sh to support both argument styles: - Old style: <model> <commit> <agent> <max_iters> <num_workers> - New style: --llm-config <config> --agent-cls <agent> [other options] Updated README to document both usage styles with examples. This maintains backward compatibility with existing scripts.

- Changed imports to use relative paths - Added __init__.py to helper directory - This fixes ModuleNotFoundError when running the benchmark

- Added OpenHands root to PYTHONPATH in run_infer.sh - Changed back to absolute imports in run_infer.py - This fixes the 'no known parent package' error

github-actions · 2025-03-31T02:47:37Z

This PR is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

openhands-agent and others added 7 commits February 25, 2025 04:45

Merge pull request #4 from AlexCuadron/feature/aider-bench-llm-comple…

9315619

…tions-fork feat: Enable llm_completions logging in aider_bench

fix: Use relative imports in polyglot aider benchmark

0121e57

- Changed imports to use relative paths - Added __init__.py to helper directory - This fixes ModuleNotFoundError when running the benchmark

fix: Fix Python package imports in polyglot aider benchmark

3a2b167

- Added OpenHands root to PYTHONPATH in run_infer.sh - Changed back to absolute imports in run_infer.py - This fixes the 'no known parent package' error

Add polyglot benchmark implementation

afbf10f

AlexCuadron mentioned this pull request Feb 26, 2025

Add polyglot benchmark implementation #6

Merged

AlexCuadron force-pushed the main branch from 013ff2d to 205a79b Compare February 28, 2025 23:52

github-actions bot added the Stale label Mar 31, 2025

AlexCuadron force-pushed the main branch from e76a772 to 9adfced Compare April 1, 2025 12:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add polyglot benchmark implementation#5

Add polyglot benchmark implementation#5
AlexCuadron wants to merge 7 commits intomainfrom
add-polyglot-benchmark

AlexCuadron commented Feb 26, 2025

Uh oh!

github-actions bot commented Mar 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AlexCuadron commented Feb 26, 2025

Uh oh!

github-actions bot commented Mar 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants