1000 JS/TS tasks generated from 30 open-source GitHub repos using SWE-gen.
- is a merged GitHub PR with linked Issues
- has 3-10 source files edited
- has Fail-to-Pass unit tests
- passes NOP (baseline fails) and Oracle (fix succeeds) validation
- follows the Harbor format
Install Harbor:
uv tool install harborRun the dataset oracle solutions to verify setup:
harbor run --dataset swe-gen-js \
--agent oracle \
--n-concurrent 4 This command automatically downloads the tasks for the benchmark.
Run with Codex:
export OPENAI_API_KEY=<YOUR-KEY>
harbor run --dataset swe-gen-js \
--agent codex \
--model openai/gpt-5.2-codex \
--n-concurrent 4

