Skip to content

Xubqpanda/EcoClaw-Bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EcoClaw-Bench

Benchmarking and reproducibility suite for EcoClaw.

This repository is the official place to:

  • Install and run EcoClaw end-to-end
  • Reproduce PinchBench baseline vs EcoClaw runs
  • Store raw benchmark outputs and comparison reports
  • Extend evaluation to additional datasets over time

Scope

  • Runtime under test: EcoClaw
  • Primary benchmark (phase 1): PinchBench-compatible skill tasks (recommended fork: Xubqpanda/skill)
  • Evaluation goal: improve token efficiency while maintaining or improving task quality

Repository Layout

EcoClaw-Bench/
├── docs/
├── experiments/
│   ├── configs/
│   │   └── pinchbench/
│   └── scripts/
├── results/
│   ├── raw/
│   └── reports/
└── assets/

Environment Setup

  1. Copy .env.example to .env
  2. Fill your API key and base URL
cp .env.example .env

Detailed variable reference: docs/env.md

Dataset Assets (Google Drive)

Large dataset assets are not stored in git. After cloning, download the archives from Google Drive and extract them to:

  • experiments/dataset/claw_eval/assets/
  • experiments/dataset/pinchbench/assets/

Recommended release structure on Drive:

  • claw_eval_assets_YYYYMMDD.zip
  • pinchbench_assets_YYYYMMDD.zip

Maintain links in this section:

  • Claw Eval assets: https://drive.google.com/drive/folders/1JXKLgfQ4Q3qSXEeOP5a3XjjmYS9t9pyc?usp=sharing
  • PinchBench assets: https://drive.google.com/drive/folders/1JXKLgfQ4Q3qSXEeOP5a3XjjmYS9t9pyc?usp=sharing

After extraction, verify:

ls experiments/dataset/claw_eval/assets
ls experiments/dataset/pinchbench/assets

Compatibility Notes

This repo uses a patched benchmark flow compared with upstream PinchBench scripts:

  • Use the local/forked skill repo (set ECOCLAW_SKILL_DIR if needed).
  • Baseline scripts support isolated parallel execution via --parallel / ECOCLAW_PARALLEL.
  • Model aliases in experiment scripts are mapped to dica/* provider ids by default.

If your OpenClaw default model is not dica/*, prefer explicit full model ids in .env:

  • ECOCLAW_MODEL=dica/gpt-5-mini
  • ECOCLAW_JUDGE=dica/gpt-5-nano

This avoids silent fallback to other providers/models in mixed-provider OpenClaw configs.

Quick Start (Linux)

  1. Read docs/install.md
  2. Fill .env.example fields in your local .env
  3. Run baseline
  4. Run EcoClaw-enabled
  5. Compare outputs

Linux (bash):

Example:

./experiments/scripts/run_pinchbench_baseline.sh --suite all --parallel 4

Linux quick guide: docs/linux.md

About

Benchmarking and reproducibility suite for EcoClaw across PinchBench and other agent datasets.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors