Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -202,9 +202,9 @@ Templates contain the workflows, tips, and examples that require human judgment.
| `{{BROWSE_SETUP}}` | `gen-skill-docs.ts` | Binary discovery + setup instructions |
| `{{BASE_BRANCH_DETECT}}` | `gen-skill-docs.ts` | Dynamic base branch detection for PR-targeting skills (ship, review, qa, plan-ceo-review) |
| `{{QA_METHODOLOGY}}` | `gen-skill-docs.ts` | Shared QA methodology block for /qa and /qa-only |
| `{{DESIGN_METHODOLOGY}}` | `gen-skill-docs.ts` | Shared design audit methodology for /plan-design-review and /qa-design-review |
| `{{DESIGN_METHODOLOGY}}` | `gen-skill-docs.ts` | Shared design audit methodology for /plan-design-review and /design-review |
| `{{REVIEW_DASHBOARD}}` | `gen-skill-docs.ts` | Review Readiness Dashboard for /ship pre-flight |
| `{{TEST_BOOTSTRAP}}` | `gen-skill-docs.ts` | Test framework detection, bootstrap, CI/CD setup for /qa, /ship, /qa-design-review |
| `{{TEST_BOOTSTRAP}}` | `gen-skill-docs.ts` | Test framework detection, bootstrap, CI/CD setup for /qa, /ship, /design-review |

This is structurally sound — if a command exists in code, it appears in docs. If it doesn't exist, it can't appear.

Expand Down
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,18 @@
# Changelog

## [0.6.4.0] - 2026-03-17

### Added

- **`/plan-design-review` is now interactive — rates 0-10, fixes the plan.** Instead of producing a report with letter grades, the designer now works like CEO and Eng review: rates each design dimension 0-10, explains what a 10 looks like, then edits the plan to get there. One AskUserQuestion per design choice. The output is a better plan, not a document about the plan.
- **CEO review now calls in the designer.** When `/plan-ceo-review` detects UI scope in a plan, it activates a Design & UX section (Section 11) covering information architecture, interaction state coverage, AI slop risk, and responsive intention. For deep design work, it recommends `/plan-design-review`.
- **14 of 15 skills now have full test coverage (E2E + LLM-judge + validation).** Added LLM-judge quality evals for 10 skills that were missing them: ship, retro, qa-only, plan-ceo-review, plan-eng-review, plan-design-review, design-review, design-consultation, document-release, gstack-upgrade. Added real E2E test for gstack-upgrade (was a `.todo`). Added design-consultation to command validation.
- **Bisect commit style.** CLAUDE.md now requires every commit to be a single logical change — renames separate from rewrites, test infrastructure separate from test implementations.

### Changed

- `/qa-design-review` renamed to `/design-review` — the "qa-" prefix was confusing now that `/plan-design-review` is plan-mode. Updated across all 22 files.

## [0.6.3.0] - 2026-03-17

### Added
Expand Down
18 changes: 17 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ gstack/
│ └── skill-e2e.test.ts # Tier 2: E2E via claude -p (~$3.85/run)
├── qa-only/ # /qa-only skill (report-only QA, no fixes)
├── plan-design-review/ # /plan-design-review skill (report-only design audit)
├── qa-design-review/ # /qa-design-review skill (design audit + fix loop)
├── design-review/ # /design-review skill (design audit + fix loop)
├── ship/ # Ship workflow skill
├── review/ # PR review skill
├── plan-ceo-review/ # /plan-ceo-review skill
Expand Down Expand Up @@ -119,6 +119,22 @@ symlink or a real copy. If it's a symlink to your working directory, be aware th
gen-skill-docs pipeline, consider whether the changes should be tested in isolation
before going live (especially if the user is actively using gstack in other windows).

## Commit style

**Always bisect commits.** Every commit should be a single logical change. When
you've made multiple changes (e.g., a rename + a rewrite + new tests), split them
into separate commits before pushing. Each commit should be independently
understandable and revertable.

Examples of good bisection:
- Rename/move separate from behavior changes
- Test infrastructure (touchfiles, helpers) separate from test implementations
- Template changes separate from generated file regeneration
- Mechanical refactors separate from new features

When the user says "bisect commit" or "bisect and push," split staged/unstaged
changes into logical commits and push.

## CHANGELOG style

CHANGELOG.md is **for users**, not contributors. Write it like product release notes:
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ One feature. Seven commands. The agent reframed the product, ran an 80-item desi
| `/browse` | **QA Engineer** | Give the agent eyes. Real Chromium browser, real clicks, real screenshots. ~100ms per command. |
| `/qa` | **QA Lead** | Test your app, find bugs, fix them with atomic commits, re-verify. Auto-generates regression tests for every fix. |
| `/qa-only` | **QA Reporter** | Same methodology as /qa but report only. Use when you want a pure bug report without code changes. |
| `/qa-design-review` | **Designer Who Codes** | Same audit as /plan-design-review, then fixes what it finds. Atomic commits, before/after screenshots. |
| `/design-review` | **Designer Who Codes** | Same audit as /plan-design-review, then fixes what it finds. Atomic commits, before/after screenshots. |
| `/setup-browser-cookies` | **Session Manager** | Import cookies from your real browser (Chrome, Arc, Brave, Edge) into the headless session. Test authenticated pages. |
| `/retro` | **Eng Manager** | Team-aware weekly retro. Per-person breakdowns, shipping streaks, test health trends, growth opportunities. |
| `/document-release` | **Technical Writer** | Update all project docs to match what you just shipped. Catches stale READMEs automatically. |
Expand All @@ -129,7 +129,7 @@ One feature. Seven commands. The agent reframed the product, ran an 80-item desi

## What's new and why it matters

**Design is at the heart.** `/design-consultation` doesn't just pick fonts. It researches what's out there in your space, proposes safe choices AND creative risks, generates realistic mockups of your actual product, and writes `DESIGN.md` — and then `/qa-design-review` and `/plan-eng-review` read what you chose. Design decisions flow through the whole system.
**Design is at the heart.** `/design-consultation` doesn't just pick fonts. It researches what's out there in your space, proposes safe choices AND creative risks, generates realistic mockups of your actual product, and writes `DESIGN.md` — and then `/design-review` and `/plan-eng-review` read what you chose. Design decisions flow through the whole system.

**`/qa` was a massive unlock.** It let me go from 6 to 12 parallel workers. Claude Code saying *"I SEE THE ISSUE"* and then actually fixing it, generating a regression test, and verifying the fix — that changed how I work. The agent has eyes now.

Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.6.3.0
0.6.4.0
16 changes: 8 additions & 8 deletions qa-design-review/SKILL.md → design-review/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
---
name: qa-design-review
version: 1.0.0
name: design-review
version: 2.0.0
description: |
Designer's eye QA: finds visual inconsistency, spacing issues, hierarchy problems,
AI slop patterns, and slow interactions — then fixes them. Iteratively fixes issues
in source code, committing each fix atomically and re-verifying with before/after
screenshots. For report-only mode, use /plan-design-review instead.
screenshots. For plan-mode design review (before implementation), use /plan-design-review.
allowed-tools:
- Bash
- Read
Expand Down Expand Up @@ -123,7 +123,7 @@ Hey gstack team — ran into this while using /{skill-name}:

Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"

# /qa-design-review: Design Audit → Fix → Verify
# /design-review: Design Audit → Fix → Verify

You are a senior product designer AND a frontend engineer. Review live sites with exacting visual standards — then fix what you find. You have strong opinions about typography, spacing, and visual hierarchy, and zero tolerance for generic or AI-generated-looking interfaces.

Expand All @@ -150,7 +150,7 @@ Look for `DESIGN.md`, `design-system.md`, or similar in the repo root. If found,

```bash
if [ -n "$(git status --porcelain)" ]; then
echo "ERROR: Working tree is dirty. Commit or stash changes before running /qa-design-review."
echo "ERROR: Working tree is dirty. Commit or stash changes before running /design-review."
exit 1
fi
```
Expand Down Expand Up @@ -766,7 +766,7 @@ Design fixes are typically CSS-only. Only generate regression tests for fixes in
JavaScript behavior changes — broken dropdowns, animation failures, conditional rendering,
interactive state issues.

For CSS-only fixes: skip entirely. CSS regressions are caught by re-running /qa-design-review.
For CSS-only fixes: skip entirely. CSS regressions are caught by re-running /design-review.

If the fix involved JS behavior: follow the same procedure as /qa Phase 8e.5 (study existing
test patterns, write a regression test encoding the exact bug condition, run it, commit if
Expand Down Expand Up @@ -838,11 +838,11 @@ Write to `~/.gstack/projects/{slug}/{user}-{branch}-design-audit-{datetime}.md`
If the repo has a `TODOS.md`:

1. **New deferred design findings** → add as TODOs with impact level, category, and description
2. **Fixed findings that were in TODOS.md** → annotate with "Fixed by /qa-design-review on {branch}, {date}"
2. **Fixed findings that were in TODOS.md** → annotate with "Fixed by /design-review on {branch}, {date}"

---

## Additional Rules (qa-design-review specific)
## Additional Rules (design-review specific)

11. **Clean working tree required.** Refuse to start if `git status --porcelain` is non-empty.
12. **One commit per fix.** Never bundle multiple design fixes into one commit.
Expand Down
16 changes: 8 additions & 8 deletions qa-design-review/SKILL.md.tmpl → design-review/SKILL.md.tmpl
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
---
name: qa-design-review
version: 1.0.0
name: design-review
version: 2.0.0
description: |
Designer's eye QA: finds visual inconsistency, spacing issues, hierarchy problems,
AI slop patterns, and slow interactions — then fixes them. Iteratively fixes issues
in source code, committing each fix atomically and re-verifying with before/after
screenshots. For report-only mode, use /plan-design-review instead.
screenshots. For plan-mode design review (before implementation), use /plan-design-review.
allowed-tools:
- Bash
- Read
Expand All @@ -19,7 +19,7 @@ allowed-tools:

{{PREAMBLE}}

# /qa-design-review: Design Audit → Fix → Verify
# /design-review: Design Audit → Fix → Verify

You are a senior product designer AND a frontend engineer. Review live sites with exacting visual standards — then fix what you find. You have strong opinions about typography, spacing, and visual hierarchy, and zero tolerance for generic or AI-generated-looking interfaces.

Expand All @@ -46,7 +46,7 @@ Look for `DESIGN.md`, `design-system.md`, or similar in the repo root. If found,

```bash
if [ -n "$(git status --porcelain)" ]; then
echo "ERROR: Working tree is dirty. Commit or stash changes before running /qa-design-review."
echo "ERROR: Working tree is dirty. Commit or stash changes before running /design-review."
exit 1
fi
```
Expand Down Expand Up @@ -164,7 +164,7 @@ Design fixes are typically CSS-only. Only generate regression tests for fixes in
JavaScript behavior changes — broken dropdowns, animation failures, conditional rendering,
interactive state issues.

For CSS-only fixes: skip entirely. CSS regressions are caught by re-running /qa-design-review.
For CSS-only fixes: skip entirely. CSS regressions are caught by re-running /design-review.

If the fix involved JS behavior: follow the same procedure as /qa Phase 8e.5 (study existing
test patterns, write a regression test encoding the exact bug condition, run it, commit if
Expand Down Expand Up @@ -236,11 +236,11 @@ Write to `~/.gstack/projects/{slug}/{user}-{branch}-design-audit-{datetime}.md`
If the repo has a `TODOS.md`:

1. **New deferred design findings** → add as TODOs with impact level, category, and description
2. **Fixed findings that were in TODOS.md** → annotate with "Fixed by /qa-design-review on {branch}, {date}"
2. **Fixed findings that were in TODOS.md** → annotate with "Fixed by /design-review on {branch}, {date}"

---

## Additional Rules (qa-design-review specific)
## Additional Rules (design-review specific)

11. **Clean working tree required.** Refuse to start if `git status --porcelain` is non-empty.
12. **One commit per fix.** Never bundle multiple design fixes into one commit.
Expand Down
Loading