searchbench/cli.py is the CLI entry point (installed as the searchbench command). Core logic lives in searchbench/ (providers/, runner.py, judge.py, queries/, reporter.py). Reports and history are written to results/. Configuration templates live in .env.example, and timeouts in config.toml. Dependencies are pinned in requirements.txt.
./.venv/bin/python -m pip install -r requirements.txt- install runtime dependencies../scripts/searchbench run- full evaluation run using the public query set../scripts/searchbench run --queries hard- run the hard, evidence-gated benchmark../scripts/searchbench quick- 10-query smoke check../scripts/searchbench history- list recent benchmark runs../scripts/searchbench summary- show the latest run summary table../scripts/searchbench report- open the latest HTML report../scripts/searchbench calibrate- suggest timeouts from historical latency data../scripts/searchbench debug --provider exa --queries hard --count 5- dump raw provider responses for diagnostics.- Optional:
python3 -m pip install -e .to enable thesearchbenchCLI.
Use 4-space indentation and standard Python naming: snake_case for functions and variables, CapWords for classes. Keep type hints where present and follow existing async patterns (async def, await). When adding providers, implement Provider from searchbench/providers/base.py and register via @register.
Run tests with ./.venv/bin/python -m unittest discover -s tests. Validate changes with the CLI using ./scripts/searchbench quick before running a full benchmark.
Commit messages are short and imperative; many use a lightweight scope prefix like docs: or cli:. For PRs, include a clear summary, commands run, and any new env vars added to .env.example. Attach a screenshot or pasted output for UI or visualization changes.
Store API keys in .env and never commit it. Keep .env.example and README up to date when adding new providers or settings. Use config.toml for timeout overrides.
When ending a work session, you MUST complete ALL steps below. Work is NOT complete until git push succeeds.
MANDATORY WORKFLOW:
- File issues for remaining work - Create issues for anything that needs follow-up
- Run quality gates (if code changed) - Tests, linters, builds
- Update issue status - Close finished work, update in-progress items
- PUSH TO REMOTE - This is MANDATORY:
git pull --rebase bd sync git push git status # MUST show "up to date with origin" - Clean up - Clear stashes, prune remote branches
- Verify - All changes committed AND pushed
- Hand off - Provide context for next session
CRITICAL RULES:
- Work is NOT complete until
git pushsucceeds - NEVER stop before pushing - that leaves work stranded locally
- NEVER say "ready to push when you are" - YOU must push
- If push fails, resolve and retry until it succeeds