HMAKT99 · HMAKT99 · Mar 16, 2026 · Mar 16, 2026 · Mar 16, 2026 · Mar 16, 2026
diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md
@@ -200,6 +200,11 @@ Templates contain the workflows, tips, and examples that require human judgment.
 | `{{SNAPSHOT_FLAGS}}` | `snapshot.ts` | Flag reference with examples |
 | `{{PREAMBLE}}` | `gen-skill-docs.ts` | Startup block: update check, session tracking, contributor mode, AskUserQuestion format |
 | `{{BROWSE_SETUP}}` | `gen-skill-docs.ts` | Binary discovery + setup instructions |
+| `{{BASE_BRANCH_DETECT}}` | `gen-skill-docs.ts` | Dynamic base branch detection for PR-targeting skills (ship, review, qa, plan-ceo-review) |
+| `{{QA_METHODOLOGY}}` | `gen-skill-docs.ts` | Shared QA methodology block for /qa and /qa-only |
+| `{{DESIGN_METHODOLOGY}}` | `gen-skill-docs.ts` | Shared design audit methodology for /plan-design-review and /qa-design-review |
+| `{{REVIEW_DASHBOARD}}` | `gen-skill-docs.ts` | Review Readiness Dashboard for /ship pre-flight |
+| `{{TEST_BOOTSTRAP}}` | `gen-skill-docs.ts` | Test framework detection, bootstrap, CI/CD setup for /qa, /ship, /qa-design-review |
 
 This is structurally sound — if a command exists in code, it appears in docs. If it doesn't exist, it can't appear.
 

diff --git a/BROWSER.md b/BROWSER.md
@@ -127,6 +127,18 @@ The `console`, `network`, and `dialog` commands read from the in-memory buffers,
 
 Dialogs (alert, confirm, prompt) are auto-accepted by default to prevent browser lockup. The `dialog-accept` and `dialog-dismiss` commands control this behavior. For prompts, `dialog-accept <text>` provides the response text. All dialogs are logged to the dialog buffer with type, message, and action taken.
 
+### JavaScript execution (`js` and `eval`)
+
+`js` runs a single expression, `eval` runs a JS file. Both support `await` — expressions containing `await` are automatically wrapped in an async context:
+
+```bash
+$B js "await fetch('/api/data').then(r => r.json())"  # works
+$B js "document.title"                                  # also works (no wrapping needed)
+$B eval my-script.js                                    # file with await works too
+```
+
+For `eval` files, single-line files return the expression value directly. Multi-line files need explicit `return` when using `await`. Comments containing "await" don't trigger wrapping.
+
 ### Multi-workspace support
 
 Each workspace gets its own isolated browser instance with its own Chromium process, tabs, cookies, and logs. State is stored in `.gstack/` inside the project root (detected via `git rev-parse --show-toplevel`).

diff --git a/CHANGELOG.md b/CHANGELOG.md
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -5,8 +5,11 @@
 ```bash
 bun install          # install dependencies
 bun test             # run free tests (browse + snapshot + skill validation)
-bun run test:evals   # run paid evals: LLM judge + E2E (~$4/run)
-bun run test:e2e     # run E2E tests only (~$3.85/run)
+bun run test:evals   # run paid evals: LLM judge + E2E (diff-based, ~$4/run max)
+bun run test:evals:all  # run ALL paid evals regardless of diff
+bun run test:e2e     # run E2E tests only (diff-based, ~$3.85/run max)
+bun run test:e2e:all # run ALL E2E tests regardless of diff
+bun run eval:select  # show which tests would run based on current diff
 bun run dev <cmd>    # run CLI in dev mode, e.g. bun run dev goto https://example.com
 bun run build        # gen docs + compile binaries
 bun run gen:skill-docs  # regenerate SKILL.md files from templates
@@ -21,6 +24,12 @@ bun run eval:summary # aggregate stats across all eval runs
 (tool-by-tool via `--output-format stream-json --verbose`). Results are persisted
 to `~/.gstack-dev/evals/` with auto-comparison against the previous run.
 
+**Diff-based test selection:** `test:evals` and `test:e2e` auto-select tests based
+on `git diff` against the base branch. Each test declares its file dependencies in
+`test/helpers/touchfiles.ts`. Changes to global touchfiles (session-runner, eval-store,
+llm-judge, gen-skill-docs) trigger all tests. Use `EVALS_ALL=1` or the `:all` script
+variants to force all tests. Run `eval:select` to preview which tests would run.
+
 ## Project structure
 
 ```
@@ -43,11 +52,14 @@ gstack/
 │   ├── skill-llm-eval.test.ts   # Tier 3: LLM-as-judge (~$0.15/run)
 │   └── skill-e2e.test.ts         # Tier 2: E2E via claude -p (~$3.85/run)
 ├── qa-only/         # /qa-only skill (report-only QA, no fixes)
+├── plan-design-review/  # /plan-design-review skill (report-only design audit)
+├── qa-design-review/    # /qa-design-review skill (design audit + fix loop)
 ├── ship/            # Ship workflow skill
 ├── review/          # PR review skill
 ├── plan-ceo-review/ # /plan-ceo-review skill
 ├── plan-eng-review/ # /plan-eng-review skill
 ├── retro/           # Retrospective skill
+├── document-release/ # /document-release skill (post-ship doc updates)
 ├── setup            # One-time setup: build binary + symlink skills
 ├── SKILL.md         # Generated from SKILL.md.tmpl (don't edit directly)
 ├── SKILL.md.tmpl    # Template: edit this, run gen:skill-docs
@@ -65,6 +77,23 @@ SKILL.md files are **generated** from `.tmpl` templates. To update docs:
 To add a new browse command: add it to `browse/src/commands.ts` and rebuild.
 To add a snapshot flag: add it to `SNAPSHOT_FLAGS` in `browse/src/snapshot.ts` and rebuild.
 
+## Writing SKILL templates
+
+SKILL.md.tmpl files are **prompt templates read by Claude**, not bash scripts.
+Each bash code block runs in a separate shell — variables do not persist between blocks.
+
+Rules:
+- **Use natural language for logic and state.** Don't use shell variables to pass
+  state between code blocks. Instead, tell Claude what to remember and reference
+  it in prose (e.g., "the base branch detected in Step 0").
+- **Don't hardcode branch names.** Detect `main`/`master`/etc dynamically via
+  `gh pr view` or `gh repo view`. Use `{{BASE_BRANCH_DETECT}}` for PR-targeting
+  skills. Use "the base branch" in prose, `<base>` in code block placeholders.
+- **Keep bash blocks self-contained.** Each code block should work independently.
+  If a block needs context from a previous step, restate it in the prose above.
+- **Express conditionals as English.** Instead of nested `if/elif/else` in bash,
+  write numbered decision steps: "1. If X, do Y. 2. Otherwise, do Z."
+
 ## Browser interaction
 
 When you need to interact with a browser (QA, dogfooding, cookie setup), use the
@@ -101,6 +130,44 @@ CHANGELOG.md is **for users**, not contributors. Write it like product release n
 - No jargon: say "every question now tells you which project and branch you're in" not
   "AskUserQuestion format standardized across skill templates via preamble resolver."
 
+## AI effort compression
+
+When estimating or discussing effort, always show both human-team and CC+gstack time:
+
+| Task type | Human team | CC+gstack | Compression |
+|-----------|-----------|-----------|-------------|
+| Boilerplate / scaffolding | 2 days | 15 min | ~100x |
+| Test writing | 1 day | 15 min | ~50x |
+| Feature implementation | 1 week | 30 min | ~30x |
+| Bug fix + regression test | 4 hours | 15 min | ~20x |
+| Architecture / design | 2 days | 4 hours | ~5x |
+| Research / exploration | 1 day | 3 hours | ~3x |
+
+Completeness is cheap. Don't recommend shortcuts when the complete implementation
+is a "lake" (achievable) not an "ocean" (multi-quarter migration). See the
+Completeness Principle in the skill preamble for the full philosophy.
+
+## Local plans
+
+Contributors can store long-range vision docs and design documents in `~/.gstack-dev/plans/`.
+These are local-only (not checked in). When reviewing TODOS.md, check `plans/` for candidates
+that may be ready to promote to TODOs or implement.
+
+## E2E eval failure blame protocol
+
+When an E2E eval fails during `/ship` or any other workflow, **never claim "not
+related to our changes" without proving it.** These systems have invisible couplings —
+a preamble text change affects agent behavior, a new helper changes timing, a
+regenerated SKILL.md shifts prompt context.
+
+**Required before attributing a failure to "pre-existing":**
+1. Run the same eval on main (or base branch) and show it fails there too
+2. If it passes on main but fails on the branch — it IS your change. Trace the blame.
+3. If you can't run on main, say "unverified — may or may not be related" and flag it
+   as a risk in the PR body
+
+"Pre-existing" without receipts is a lazy claim. Prove it or don't say it.
+
 ## Deploying to the active skill
 
 The active skill lives at `~/.claude/skills/gstack/`. After making changes:

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -22,9 +22,11 @@ bin/dev-teardown               # deactivate — back to your global install
 
 ## Contributor mode
 
-Contributor mode is for people who want to fix gstack when it annoys them. Enable it
-and Claude Code will automatically log issues to `~/.gstack/contributor-logs/` as you
-work — what you were doing, what went wrong, repro steps, raw output.
+Contributor mode turns gstack into a self-improving tool. Enable it and Claude Code
+will periodically reflect on its gstack experience — rating it 0-10 at the end of
+each major workflow step. When something isn't a 10, it thinks about why and files
+a report to `~/.gstack/contributor-logs/` with what happened, repro steps, and what
+would make it better.
 
 ```bash
 ~/.claude/skills/gstack/bin/gstack-config set gstack_contributor true
@@ -36,7 +38,7 @@ the issue, fix it, and open a PR.
 
 ### The contributor workflow
 
-1. **Hit friction while using gstack** — contributor mode logs it automatically
+1. **Use gstack normally** — contributor mode reflects and logs issues automatically
 2. **Check your logs:** `ls ~/.gstack/contributor-logs/`
 3. **Fork and clone gstack** (if you haven't already)
 4. **Symlink your fork into the project where you hit the bug:**
@@ -52,6 +54,10 @@ the issue, fix it, and open a PR.
 This is the best way to contribute: fix gstack while doing your real work, in the
 project where you actually felt the pain.
 
+### Session awareness
+
+When you have 3+ gstack sessions open simultaneously, every question tells you which project, which branch, and what's happening. No more staring at a question thinking "wait, which window is this?" The format is consistent across all 13 skills.
+
 ## Working on gstack inside the gstack repo
 
 When you're editing gstack skills and want to test them by actually using gstack
@@ -217,6 +223,8 @@ bun run skill:check
 bun run dev:skill
 ```
 
+For template authoring best practices (natural language over bash-isms, dynamic branch detection, `{{BASE_BRANCH_DETECT}}` usage), see CLAUDE.md's "Writing SKILL templates" section.
+
 To add a browse command, add it to `browse/src/commands.ts`. To add a snapshot flag, add it to `SNAPSHOT_FLAGS` in `browse/src/snapshot.ts`. Then rebuild.
 
 ## Conductor workspaces