Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
1e06b6a
fix: dynamic base branch detection across all SKILL templates (v0.3.1…
garrytan Mar 16, 2026
78e519e
feat: await support in browse js/eval + contributor mode v2 (#104)
garrytan Mar 16, 2026
276d0cc
feat: always-on ELI16 + branch detection (v0.4.3) (#108)
garrytan Mar 16, 2026
a68244a
feat: /document-release skill — post-ship doc updates (v0.4.3) (#109)
garrytan Mar 16, 2026
c86faa7
fix: update check cache — 60min UP_TO_DATE TTL + --force flag (v0.4.4…
garrytan Mar 16, 2026
318ffdb
fix: js statement wrapping + click auto-routes option to selectOption…
garrytan Mar 17, 2026
a30f707
feat: Fix-First Review — auto-fix obvious issues, ask about hard ones…
garrytan Mar 17, 2026
4a77cc2
feat: /plan-design-review + /qa-design-review skills (v0.5.0) (#102)
garrytan Mar 17, 2026
c8c2cbb
docs: add /design-consultation skill to README (#127)
garrytan Mar 17, 2026
5f41cd9
feat: show screenshots to user during QA and browse sessions (v0.5.0.…
garrytan Mar 17, 2026
73b00b4
feat: Review Readiness Dashboard + gstack-slug helper (v0.5.1) (#130)
garrytan Mar 17, 2026
c99757b
feat: /design-consultation — risk-taking, visual research, ambitious …
garrytan Mar 17, 2026
5e9f0e7
feat: SELECTIVE EXPANSION + smarter ship gates (v0.5.3) (#134)
garrytan Mar 17, 2026
b65a464
feat: always-full eng review + ship review gate persistence (v0.5.4) …
garrytan Mar 17, 2026
a2d756f
feat: Test Bootstrap + Regression Tests + Coverage Audit (v0.6.0) (#136)
garrytan Mar 17, 2026
1f3b691
feat: /gstack-upgrade detects and syncs stale vendored copies (v0.5.4…
garrytan Mar 17, 2026
9d47619
feat: Completeness Principle — Boil the Lake (v0.6.1) (#140)
garrytan Mar 17, 2026
17c1c06
feat: diff-based test selection for E2E and LLM-judge evals (v0.6.1.0…
garrytan Mar 17, 2026
d8894b7
feat: cognitive patterns for plan-review skills (v0.6.2) (#141)
garrytan Mar 18, 2026
28becb3
feat: design review lite in /review and /ship + gstack-diff-scope (v0…
garrytan Mar 18, 2026
f91222f
docs: restructure README for faster conversion (#146)
garrytan Mar 18, 2026
b087598
feat: add /eval skill — AI output evaluator and grader
HMAKT99 Mar 18, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,11 @@ Templates contain the workflows, tips, and examples that require human judgment.
| `{{SNAPSHOT_FLAGS}}` | `snapshot.ts` | Flag reference with examples |
| `{{PREAMBLE}}` | `gen-skill-docs.ts` | Startup block: update check, session tracking, contributor mode, AskUserQuestion format |
| `{{BROWSE_SETUP}}` | `gen-skill-docs.ts` | Binary discovery + setup instructions |
| `{{BASE_BRANCH_DETECT}}` | `gen-skill-docs.ts` | Dynamic base branch detection for PR-targeting skills (ship, review, qa, plan-ceo-review) |
| `{{QA_METHODOLOGY}}` | `gen-skill-docs.ts` | Shared QA methodology block for /qa and /qa-only |
| `{{DESIGN_METHODOLOGY}}` | `gen-skill-docs.ts` | Shared design audit methodology for /plan-design-review and /qa-design-review |
| `{{REVIEW_DASHBOARD}}` | `gen-skill-docs.ts` | Review Readiness Dashboard for /ship pre-flight |
| `{{TEST_BOOTSTRAP}}` | `gen-skill-docs.ts` | Test framework detection, bootstrap, CI/CD setup for /qa, /ship, /qa-design-review |

This is structurally sound — if a command exists in code, it appears in docs. If it doesn't exist, it can't appear.

Expand Down
12 changes: 12 additions & 0 deletions BROWSER.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,18 @@ The `console`, `network`, and `dialog` commands read from the in-memory buffers,

Dialogs (alert, confirm, prompt) are auto-accepted by default to prevent browser lockup. The `dialog-accept` and `dialog-dismiss` commands control this behavior. For prompts, `dialog-accept <text>` provides the response text. All dialogs are logged to the dialog buffer with type, message, and action taken.

### JavaScript execution (`js` and `eval`)

`js` runs a single expression, `eval` runs a JS file. Both support `await` — expressions containing `await` are automatically wrapped in an async context:

```bash
$B js "await fetch('/api/data').then(r => r.json())" # works
$B js "document.title" # also works (no wrapping needed)
$B eval my-script.js # file with await works too
```

For `eval` files, single-line files return the expression value directly. Multi-line files need explicit `return` when using `await`. Comments containing "await" don't trigger wrapping.

### Multi-workspace support

Each workspace gets its own isolated browser instance with its own Chromium process, tabs, cookies, and logs. State is stored in `.gstack/` inside the project root (detected via `git rev-parse --show-toplevel`).
Expand Down
209 changes: 209 additions & 0 deletions CHANGELOG.md

Large diffs are not rendered by default.

71 changes: 69 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,11 @@
```bash
bun install # install dependencies
bun test # run free tests (browse + snapshot + skill validation)
bun run test:evals # run paid evals: LLM judge + E2E (~$4/run)
bun run test:e2e # run E2E tests only (~$3.85/run)
bun run test:evals # run paid evals: LLM judge + E2E (diff-based, ~$4/run max)
bun run test:evals:all # run ALL paid evals regardless of diff
bun run test:e2e # run E2E tests only (diff-based, ~$3.85/run max)
bun run test:e2e:all # run ALL E2E tests regardless of diff
bun run eval:select # show which tests would run based on current diff
bun run dev <cmd> # run CLI in dev mode, e.g. bun run dev goto https://example.com
bun run build # gen docs + compile binaries
bun run gen:skill-docs # regenerate SKILL.md files from templates
Expand All @@ -21,6 +24,12 @@ bun run eval:summary # aggregate stats across all eval runs
(tool-by-tool via `--output-format stream-json --verbose`). Results are persisted
to `~/.gstack-dev/evals/` with auto-comparison against the previous run.

**Diff-based test selection:** `test:evals` and `test:e2e` auto-select tests based
on `git diff` against the base branch. Each test declares its file dependencies in
`test/helpers/touchfiles.ts`. Changes to global touchfiles (session-runner, eval-store,
llm-judge, gen-skill-docs) trigger all tests. Use `EVALS_ALL=1` or the `:all` script
variants to force all tests. Run `eval:select` to preview which tests would run.

## Project structure

```
Expand All @@ -43,11 +52,14 @@ gstack/
│ ├── skill-llm-eval.test.ts # Tier 3: LLM-as-judge (~$0.15/run)
│ └── skill-e2e.test.ts # Tier 2: E2E via claude -p (~$3.85/run)
├── qa-only/ # /qa-only skill (report-only QA, no fixes)
├── plan-design-review/ # /plan-design-review skill (report-only design audit)
├── qa-design-review/ # /qa-design-review skill (design audit + fix loop)
├── ship/ # Ship workflow skill
├── review/ # PR review skill
├── plan-ceo-review/ # /plan-ceo-review skill
├── plan-eng-review/ # /plan-eng-review skill
├── retro/ # Retrospective skill
├── document-release/ # /document-release skill (post-ship doc updates)
├── setup # One-time setup: build binary + symlink skills
├── SKILL.md # Generated from SKILL.md.tmpl (don't edit directly)
├── SKILL.md.tmpl # Template: edit this, run gen:skill-docs
Expand All @@ -65,6 +77,23 @@ SKILL.md files are **generated** from `.tmpl` templates. To update docs:
To add a new browse command: add it to `browse/src/commands.ts` and rebuild.
To add a snapshot flag: add it to `SNAPSHOT_FLAGS` in `browse/src/snapshot.ts` and rebuild.

## Writing SKILL templates

SKILL.md.tmpl files are **prompt templates read by Claude**, not bash scripts.
Each bash code block runs in a separate shell — variables do not persist between blocks.

Rules:
- **Use natural language for logic and state.** Don't use shell variables to pass
state between code blocks. Instead, tell Claude what to remember and reference
it in prose (e.g., "the base branch detected in Step 0").
- **Don't hardcode branch names.** Detect `main`/`master`/etc dynamically via
`gh pr view` or `gh repo view`. Use `{{BASE_BRANCH_DETECT}}` for PR-targeting
skills. Use "the base branch" in prose, `<base>` in code block placeholders.
- **Keep bash blocks self-contained.** Each code block should work independently.
If a block needs context from a previous step, restate it in the prose above.
- **Express conditionals as English.** Instead of nested `if/elif/else` in bash,
write numbered decision steps: "1. If X, do Y. 2. Otherwise, do Z."

## Browser interaction

When you need to interact with a browser (QA, dogfooding, cookie setup), use the
Expand Down Expand Up @@ -101,6 +130,44 @@ CHANGELOG.md is **for users**, not contributors. Write it like product release n
- No jargon: say "every question now tells you which project and branch you're in" not
"AskUserQuestion format standardized across skill templates via preamble resolver."

## AI effort compression

When estimating or discussing effort, always show both human-team and CC+gstack time:

| Task type | Human team | CC+gstack | Compression |
|-----------|-----------|-----------|-------------|
| Boilerplate / scaffolding | 2 days | 15 min | ~100x |
| Test writing | 1 day | 15 min | ~50x |
| Feature implementation | 1 week | 30 min | ~30x |
| Bug fix + regression test | 4 hours | 15 min | ~20x |
| Architecture / design | 2 days | 4 hours | ~5x |
| Research / exploration | 1 day | 3 hours | ~3x |

Completeness is cheap. Don't recommend shortcuts when the complete implementation
is a "lake" (achievable) not an "ocean" (multi-quarter migration). See the
Completeness Principle in the skill preamble for the full philosophy.

## Local plans

Contributors can store long-range vision docs and design documents in `~/.gstack-dev/plans/`.
These are local-only (not checked in). When reviewing TODOS.md, check `plans/` for candidates
that may be ready to promote to TODOs or implement.

## E2E eval failure blame protocol

When an E2E eval fails during `/ship` or any other workflow, **never claim "not
related to our changes" without proving it.** These systems have invisible couplings —
a preamble text change affects agent behavior, a new helper changes timing, a
regenerated SKILL.md shifts prompt context.

**Required before attributing a failure to "pre-existing":**
1. Run the same eval on main (or base branch) and show it fails there too
2. If it passes on main but fails on the branch — it IS your change. Trace the blame.
3. If you can't run on main, say "unverified — may or may not be related" and flag it
as a risk in the PR body

"Pre-existing" without receipts is a lazy claim. Prove it or don't say it.

## Deploying to the active skill

The active skill lives at `~/.claude/skills/gstack/`. After making changes:
Expand Down
16 changes: 12 additions & 4 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,11 @@ bin/dev-teardown # deactivate — back to your global install

## Contributor mode

Contributor mode is for people who want to fix gstack when it annoys them. Enable it
and Claude Code will automatically log issues to `~/.gstack/contributor-logs/` as you
work — what you were doing, what went wrong, repro steps, raw output.
Contributor mode turns gstack into a self-improving tool. Enable it and Claude Code
will periodically reflect on its gstack experience — rating it 0-10 at the end of
each major workflow step. When something isn't a 10, it thinks about why and files
a report to `~/.gstack/contributor-logs/` with what happened, repro steps, and what
would make it better.

```bash
~/.claude/skills/gstack/bin/gstack-config set gstack_contributor true
Expand All @@ -36,7 +38,7 @@ the issue, fix it, and open a PR.

### The contributor workflow

1. **Hit friction while using gstack** — contributor mode logs it automatically
1. **Use gstack normally** — contributor mode reflects and logs issues automatically
2. **Check your logs:** `ls ~/.gstack/contributor-logs/`
3. **Fork and clone gstack** (if you haven't already)
4. **Symlink your fork into the project where you hit the bug:**
Expand All @@ -52,6 +54,10 @@ the issue, fix it, and open a PR.
This is the best way to contribute: fix gstack while doing your real work, in the
project where you actually felt the pain.

### Session awareness

When you have 3+ gstack sessions open simultaneously, every question tells you which project, which branch, and what's happening. No more staring at a question thinking "wait, which window is this?" The format is consistent across all 13 skills.

## Working on gstack inside the gstack repo

When you're editing gstack skills and want to test them by actually using gstack
Expand Down Expand Up @@ -217,6 +223,8 @@ bun run skill:check
bun run dev:skill
```

For template authoring best practices (natural language over bash-isms, dynamic branch detection, `{{BASE_BRANCH_DETECT}}` usage), see CLAUDE.md's "Writing SKILL templates" section.

To add a browse command, add it to `browse/src/commands.ts`. To add a snapshot flag, add it to `SNAPSHOT_FLAGS` in `browse/src/snapshot.ts`. Then rebuild.

## Conductor workspaces
Expand Down
Loading