sweverett
diff --git a/‎CLAUDE.md‎
Lines changed: 11 additions & 4 deletions b/‎CLAUDE.md‎
Lines changed: 11 additions & 4 deletions
diff --git a/‎README.md‎
Lines changed: 29 additions & 6 deletions b/‎README.md‎
Lines changed: 29 additions & 6 deletions
diff --git a/‎docs/ROADMAP.md‎
Lines changed: 8 additions & 3 deletions b/‎docs/ROADMAP.md‎
Lines changed: 8 additions & 3 deletions
@@ -27,15 +27,17 @@ See [VISION.md](docs/VISION.md) for full architecture. See [ROADMAP.md](docs/ROA
 
 ```
 src/parallax/           # Main package
-  cli/                  # Typer CLI commands (init, refine)
-  core/                 # Config, interview, renderer, workflow logic
+  cli/                  # Typer CLI commands (init, refine, config)
+  core/                 # Config, interview, renderer, refiner
     config.py           # ProjectConfig dataclass
     interview.py        # Structured init interview
     renderer.py         # Template rendering + file generation
+    refiner.py          # Auto-refinement via Claude CLI
   db/                   # SQLite models + queries (Layer 2)
   templates/            # string.Template files for parallax init output
+    agents/             # Agent definition templates (hypothesis_explorer, etc.)
     hooks/              # Hook script templates (test_guard, lint_check, stop_check)
-    skills/             # Skill templates (hypothesis, handoff, audit, experiment)
+    skills/             # Skill templates (hypothesis, handoff, audit, experiment, session_start)
 tests/                  # pytest (mirrors src structure)
 docs/                   # VISION.md, ROADMAP.md, plans/
 .claude/hooks/          # Hook enforcement scripts for Parallax development
@@ -105,8 +107,13 @@ pixi run check       # all of the above
 - Never add backward-compat shims — just change the code
 - Never create docs/READMEs unless explicitly requested
 
-## Documentation Maintenance
+## Plan Completion & Verification
 
+- **Archive the plan.** At the start of implementation, copy the plan file to `docs/plans/NNN_short-name.md` (next sequence number). The plan path is in the system message from plan mode.
+- Every plan's verification section is **mandatory**. At the end of implementation:
+  1. **List all verification commands** from the plan so the user can run them independently
+  2. **Execute each one** and report the result (pass/fail, key output) explicitly to the user
+- `pixi run check` is baseline; also run any CLI smoke tests or manual checks the plan specifies
 - At the end of every plan, verify README.md and other markdown docs reflect current state
 - If code changes affect documented behavior, update the relevant docs in the same PR
 - @README.md and other key docs should be reviewed before marking any plan complete
 
@@ -30,18 +30,26 @@ See [VISION.md](docs/VISION.md) for details.
 
 ```
 src/parallax/           # Main package
-  cli/                  # Typer CLI (init, refine)
-  core/                 # Config, interview, renderer
+  cli/                  # Typer CLI (init, refine, config)
+  core/                 # Config, interview, renderer, refiner
   db/                   # SQLite models (Layer 2)
   templates/            # string.Template files for init output
+    agents/             # Agent definition templates
+    skills/             # Skill templates
+    hooks/              # Hook script templates
 tests/                  # pytest (mirrors src structure)
 docs/                   # VISION.md, ROADMAP.md, plans/
 .claude/                # Skills (skill-name/SKILL.md) and hooks for development
 ```
 
+## Prerequisites
+
+- [pixi](https://pixi.sh) -- package/environment management
+- [Claude Code](https://docs.anthropic.com/en/docs/claude-code) -- required for auto-refinement during `parallax init`
+
 ## Installation
 
-Requires [pixi](https://pixi.sh). Install via:
+Install pixi:
 
 ```bash
 # macOS / Linux
@@ -80,29 +88,44 @@ parallax init
 parallax init -t /path/to/project   # target directory
 parallax init -y                     # accept defaults, skip optional
 parallax init -f                     # overwrite existing files
+parallax init --token-tier 5x        # set model tier for agents
+parallax init --skip-refine          # skip auto-refinement
 
 # Post-init refinement
 parallax refine                      # print refinement instructions
 parallax refine --done               # strip refinement comment blocks
+
+# Post-init config changes
+parallax config set token-tier 5x    # update agent model selection
 ```
 
 `parallax init` runs a structured interview generating:
 - **CLAUDE.md** -- project-specific AI agent guide
 - **PARALLAX.md** -- scientific workflow rules
 - **CONSTITUTION.md** -- core scientific principles
-- **.claude/skills/** -- hypothesis, handoff, audit, experiment skills
+- **.claude/skills/** -- hypothesis, handoff, audit, experiment, session-start skills
+- **.claude/agents/** -- hypothesis-explorer, experiment-runner, literature-reviewer, result-validator agents
 - **.claude/hooks/** -- test guard, lint check, stop check enforcement scripts
 - **.claude/settings.json** -- hook configuration referencing scripts above
 
+Token tiers control agent model selection:
+- **pro** (default) -- conservative: haiku exploration, sonnet validation
+- **5x** -- balanced: opus exploration, sonnet runner
+- **20x** -- generous: opus for most tasks
+- **api** -- unconstrained: opus everywhere
+
 ## Current Status
 
 Layer 1 (Convention System) functional. `parallax init`, `parallax refine`, hook enforcement, and skills all implemented.
 
 What exists:
-- `parallax init`: structured interview + template rendering
+- `parallax init`: structured interview + template rendering + auto-refinement
 - `parallax refine`: post-init refinement workflow
+- `parallax config`: post-init configuration changes (token tier)
 - Hook enforcement: test guard (blocks test weakening), lint check (ruff feedback), stop check (uncommitted work reminder)
-- Full skill definitions: /hypothesis, /handoff, /audit, /experiment
+- Full skill definitions: /hypothesis, /handoff, /audit, /experiment, /session-start
+- Custom agent definitions: hypothesis-explorer, experiment-runner, literature-reviewer, result-validator
+- Token tier system: model selection per agent based on usage tier (pro/5x/20x/api)
 - CI pipeline (ruff, mypy --strict, pytest)
 - Integration test suite validating generated output
 
 
@@ -39,14 +39,19 @@ Hooks/skills format could change upstream. No compatibility strategy. Should we
 
 - [x] `parallax init` interview design + implementation
 - [x] Template files: PARALLAX.md, CLAUDE.md, CONSTITUTION.md, settings.json, skills
-- [x] Claude Code skills: `/hypothesis`, `/handoff`, `/audit`, `/experiment`
+- [x] Claude Code skills: `/hypothesis`, `/handoff`, `/audit`, `/experiment`, `/session-start`
 - [x] Hook scripts: test protection, lint check, stop check
+- [x] Custom agent definitions: hypothesis-explorer, experiment-runner, literature-reviewer, result-validator
+- [x] Token tier system: model selection per agent (pro/5x/20x/api)
+- [x] Auto-refinement via Claude CLI (`parallax init` invokes `claude -p`)
+- [x] `parallax config set token-tier` for post-init changes
+- [x] `memory: project` on hypothesis + experiment skills
 - [ ] CI enhancements: semantic version validation, doc staleness check
 
 ## Layer 2 Features (MVP-beta)
 
-- [ ] SQLite schema for hypothesis lifecycle + test results
-- [ ] Git worktree integration for parallel hypotheses
+- [ ] SQLite schema for hypothesis lifecycle + test results (skill `memory: project` as short-term proxy)
+- [ ] Git worktree integration for parallel hypotheses (Claude Code handles plumbing natively; Parallax defines workflows)
 - [ ] Agent handoff summary system
 - [ ] Semantic versioning automation
 - [ ] Conversation/session logging