Minimize mandatory reading. Expand context only when triggered.
- Read §1 on every task.
- Read §2 only when a trigger matches.
- No trigger? Follow §3 and stop expanding context.
- Greet ckl at conversation start.
- Priority: safety > correctness > maintainability > speed.
- User: senior engineer; values clean architecture, minimal complexity, direct answers.
- Max function body: 20 lines. Extract if exceeded.
- No comments that restate code. Only "why" comments for non-obvious decisions.
- Every abstraction must justify itself: if used <2 places, inline it.
- Delete dead code immediately. No TODOs in committed code.
- Type annotations are documentation. Verbose names > comments.
- Between two correct approaches, pick the one with fewer moving parts.
- Trust caller and type invariants; no unnecessary defensive code.
- No compatibility shims when requirements change; redesign cleanly.
- Modify only files relevant to the task.
- One diff: experiments must test exactly ONE variable between agent_a and agent_b.
- SDK first: prefer SDK-reported metrics over estimates; fall back only when SDK returns zero.
- Isolation: every trial runs in its own
tempfile.TemporaryDirectory. - Reports: every completed experiment gets a
reports/YYYY-MM-DD-<slug>.mdcommitted to git.
- Run lint before delivery:
ruff check src/. - Small, scoped commits. Warn before destructive operations.
- Follow Code Review Guide; fix P0/P1 before commit.
| Trigger | Load |
|---|---|
| Non-trivial multi-step task | docs/guides/engineering-workflow.md; create docs/plans/YYYY-MM-DD-slug.md |
| Memory / history retrieval | docs/guides/memory-management.md; summaries first |
| Code quality / simplification question | docs/guides/code-simplification.md |
| Architecture or module boundary question | §5 Project Snapshot below |
| User correction | Codify preventive rule in docs/memory/user-patterns.md before resuming |
| Error or regression | Write entry to docs/error-experience/entries/ |
| Notable win or insight | Write entry to docs/good-experience/entries/ |
No trigger → do not load.
- Read §1 only.
- Inspect target files and neighboring patterns.
- Implement minimal correct change.
- Proportionate verification.
- Commit and report.
| Type | Entry path | Summary path |
|---|---|---|
| Error | docs/error-experience/entries/YYYY-MM-DD-slug.md |
docs/error-experience/summary/entries/YYYY-MM-DD-slug.md |
| Win | docs/good-experience/entries/YYYY-MM-DD-slug.md |
docs/good-experience/summary/entries/YYYY-MM-DD-slug.md |
Index files stay index-only (no narrative content).
- Purpose: automated A/B testing loop to optimize Claude agent configurations.
- Loop:
ExperimentPlanner→ExperimentRunner→AutoEvaluator→ResultStore→ repeat. - Key dirs:
src/openbench/(core),experiments/(manual experiments),programs/(ResearchProgram configs),results/(raw JSONL data),reports/(human-readable reports). - Entry point:
openbenchCLI (src/openbench/cli.py).
docs/guides/engineering-workflow.md · docs/guides/code-simplification.md · docs/guides/code-review-guide.md · docs/guides/memory-management.md