Run evals on Harbor tasks in the cloud.
Oddish extends Harbor with:
- Provider-aware queuing and automatic retries for LLM providers
- Real-time monitoring via dashboard or CLI
- Postgres-backed state and S3 storage for logs
Just replace harbor run with oddish run.
uv pip install oddish2. Generate an API key here
- API key generation is restricted during the beta. To request access, contact the maintainer.
export ODDISH_API_KEY="ok_..."# Run a single agent
oddish run -d terminal-bench@2.0 -a codex -m gpt-5.2-codex --n-trials 3# Or sweep multiple agents
oddish run -d terminal-bench@2.0 -c sweep.yamlExample sweep.yaml
agents:
- name: claude-code
model_name: anthropic/claude-opus-4-6
n_trials: 3
- name: codex
model_name: openai/gpt-5.4
n_trials: 3
- name: terminus-2
model_name: gemini/gemini-3-flash-preview
n_trials: 3oddish status