docs: tutorial workflows — skills, templates, planning, shipping, QA, tests by johnxie · Pull Request #178 · garrytan/gstack

johnxie · 2026-03-18T08:59:06Z

Summary

Adds the second half of the gstack tutorial — six chapters covering the workflow layer built on top of the infrastructure documented in PR #177. Together, the 10 chapters form a complete onboarding guide from "what is gstack?" to "how does it test itself?"

Depends on: PR #177 (docs/tutorial-foundation) — chapters 1-4 covering architecture, browse engine, snapshots, and commands.

What's included

File	Lines	Topic	Key Visuals
`docs/05_skill_system.md`	309	Skill anatomy: preamble, 14 roles, design rules, data flows	Skill data flow diagram, template pipeline
`docs/06_template_engine.md`	330	Template engine: placeholders, resolvers, DRY partials	Build pipeline flow, resolver architecture
`docs/07_planning_skills.md`	241	CEO, Eng Manager, Designer plan reviews	3-review sequence, interactive flow, review dashboard
`docs/08_ship_and_review.md`	284	11-step ship pipeline, 2-pass code review	Ship flowchart, review sequence, ship-review integration
`docs/09_qa_and_design_review.md`	332	Browser-based QA, design fix loop with risk heuristic	QA phases, fix loop, design risk accumulator
`docs/10_test_infrastructure.md`	519	3-tier testing, eval store, diff-based selection	Tier diagram, E2E sequence, observability pipeline

Total: ~2,015 lines across 6 files, ~68KB

The complete tutorial map

With both PRs merged, readers get a 10-chapter guided tour:

                        gstack Tutorial
                    ═══════════════════════

    PR #177 (Foundation)              This PR (Workflows)
    ════════════════════              ═══════════════════

    ┌─────────────────┐              ┌──────────────────┐
    │  1. Architecture │──────────────│  5. Skill System │
    │     Overview     │              │     14 roles,    │
    │                  │              │     anatomy,     │
    │  "What is gstack │              │     data flows   │
    │   and why?"      │              └────────┬─────────┘
    └────────┬─────────┘                       │
             │                        ┌────────▼─────────┐
    ┌────────▼─────────┐              │  6. Template     │
    │  2. Browse Engine │              │     Engine       │
    │     Persistent    │              │     Placeholders,│
    │     Chromium      │              │     resolvers    │
    │     daemon        │              └────────┬─────────┘
    └────────┬─────────┘                       │
             │                        ┌────────▼─────────┐
    ┌────────▼─────────┐              │  7. Planning     │
    │  3. Snapshot &    │              │     Skills       │
    │     Ref System    │              │     CEO → Eng →  │
    │     @e1, @e2, ... │              │     Design       │
    └────────┬─────────┘              └────────┬─────────┘
             │                                 │
    ┌────────▼─────────┐              ┌────────▼─────────┐
    │  4. Command       │              │  8. Ship &       │
    │     System        │──────────────│     Review       │
    │     52 commands   │              │     11-step pipe │
    └──────────────────┘              └────────┬─────────┘
                                               │
                                      ┌────────▼─────────┐
                                      │  9. QA & Design  │
                                      │     Review       │
                                      │     Browser QA,  │
                                      │     fix loops    │
                                      └────────┬─────────┘
                                               │
                                      ┌────────▼─────────┐
                                      │ 10. Test Infra   │
                                      │     3-tier:      │
                                      │     Static →     │
                                      │     E2E → Judge  │
                                      └──────────────────┘

Chapter highlights

Chapter 5: Skill System

How 14 Markdown files turn Claude Code into a virtual engineering team.

flowchart LR
    TMPL["SKILL.md.tmpl\n(human-written)"]
    GEN["gen-skill-docs\n(build step)"]
    MD["SKILL.md\n(generated)"]
    CLAUDE["Claude Code\n(reads at runtime)"]

    TMPL --> GEN --> MD --> CLAUDE

Key content:

Skill anatomy: preamble → role → browse setup → workflow → decisions → completion
All 14 skills mapped to phases (Planning → Build → Ship → Observability)
Skill design rules: independent bash blocks, no hardcoded branches, English conditionals
Cognitive patterns: 15 engineering instincts, 12 design thinking patterns
Data flows: test plan artifacts, review dashboard, cross-skill consumption

Chapter 6: Template Engine

The build system that keeps skill docs in sync with source code.

Placeholder	Source	Used By
`{{PREAMBLE}}`	gen-skill-docs.ts	All 14 skills
`{{BROWSE_SETUP}}`	gen-skill-docs.ts	browse, qa, qa-only, design-review
`{{COMMAND_REFERENCE}}`	commands.ts	browse
`{{SNAPSHOT_FLAGS}}`	snapshot.ts	browse
`{{QA_METHODOLOGY}}`	gen-skill-docs.ts	qa, qa-only
`{{DESIGN_METHODOLOGY}}`	gen-skill-docs.ts	plan-design-review, design-review
`{{TEST_BOOTSTRAP}}`	gen-skill-docs.ts	qa, ship, design-review

Chapter 7: Planning Skills

Three reviews from three perspectives — before writing any code:

┌──────────────────────────────────────────────────────────┐
│                    Your Plan (PLAN.md)                    │
└──────────┬──────────────────┬──────────────────┬─────────┘
           │                  │                  │
    ┌──────▼──────┐    ┌──────▼──────┐    ┌──────▼──────┐
    │ CEO Review  │    │ Eng Review  │    │Design Review│
    │             │    │             │    │             │
    │ Right       │    │ Right       │    │ Right       │
    │ problem?    │    │ architecture│    │ experience? │
    │             │    │             │    │             │
    │ 4 modes:    │    │ 4 sections: │    │ 7 passes:   │
    │ • Expand    │    │ • Arch      │    │ • Info arch │
    │ • Selective │    │ • Quality   │    │ • States    │
    │ • Hold      │    │ • Tests     │    │ • Journey   │
    │ • Reduce    │    │ • Perf      │    │ • AI slop   │
    │             │    │             │    │ • Design sys│
    │ Output:     │    │ Output:     │    │ • Responsive│
    │ Scope       │    │ Test plan   │    │ • Decisions │
    │ decisions   │    │ artifact    │    │             │
    └─────────────┘    └─────────────┘    └─────────────┘

Chapter 8: Ship & Review Pipeline

The 11-step automated shipping workflow:

Step  1: Pre-flight ──────── branch, context, review dashboard
Step  2: Merge base ──────── fetch + merge for pre-merge testing
Step  3: Run tests ───────── parallel test suites
Step  4: Eval suites ─────── conditional: if prompt files changed
Step  5: Coverage audit ──── find gaps, generate tests
Step  6: Pre-landing review ─ checklist + design-lite
Step  7: Greptile ────────── classify + escalate comments
Step  8: Version bump ────── MICRO/PATCH auto, MINOR/MAJOR ask
Step  9: CHANGELOG ────────── auto-generate, user-focused voice
Step 10: Bisect commits ──── ordered, independently revertable
Step 11: Push + PR ────────── comprehensive body with dashboard

Chapter 9: QA & Design Review

Browser-based testing with real clicks and screenshots:

flowchart TD
    subgraph QA["QA Methodology (6 phases)"]
        P1["1. Initialize"] --> P2["2. Authenticate"]
        P2 --> P3["3. Orient (map structure)"]
        P3 --> P4["4. Explore (visit pages)"]
        P4 --> P5["5. Document issues"]
        P5 --> P6["6. Health score"]
    end

    subgraph Fix["Fix Loop (/qa only)"]
        F1["Locate source"] --> F2["Apply fix"]
        F2 --> F3["Re-test with browser"]
        F3 --> F4["Atomic commit"]
        F4 --> F5["Regression test"]
    end

    P6 --> Fix

Design review risk heuristic:

Trigger	Risk Increase	Hard Stop
Each revert	+15%	Risk > 20% → stop
Each component file change	+5%	Fix count > 30 → hard cap
After fix 10	+1% per additional	—
Touching unrelated files	+20%	—

Chapter 10: Test Infrastructure

The 3-tier system that validates gstack itself:

┌─────────────────────────────────────────────────────────┐
│              gstack Test Infrastructure                  │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  Tier 1: Static Validation          FREE    <2 seconds  │
│  ┌─────────────────────────────────────────────┐        │
│  │ • Parse every $B command in every SKILL.md  │        │
│  │ • Validate against commands.ts registry     │        │
│  │ • Check snapshot flags vs SNAPSHOT_FLAGS     │        │
│  │ • Detect hardcoded branch names             │        │
│  │ • Verify template freshness                 │        │
│  └─────────────────────────────────────────────┘        │
│           │ pass                                        │
│           ▼                                             │
│  Tier 2: E2E Testing              ~$3.85   ~20 minutes  │
│  ┌─────────────────────────────────────────────┐        │
│  │ • Spawn real Claude sessions (claude -p)    │        │
│  │ • Stream NDJSON for tool-by-tool progress   │        │
│  │ • Scan for browse errors in transcript      │        │
│  │ • Planted-bug detection testing             │        │
│  │ • Diff-based selection (only changed tests) │        │
│  └─────────────────────────────────────────────┘        │
│           │ pass                                        │
│           ▼                                             │
│  Tier 3: LLM-as-Judge              ~$0.15   ~30 seconds │
│  ┌─────────────────────────────────────────────┐        │
│  │ • Score clarity/completeness/actionability  │        │
│  │ • Planted-bug outcome evaluation            │        │
│  │ • Uses claude-sonnet-4-6 for stability      │        │
│  │ • Threshold: every dimension ≥ 4/5          │        │
│  └─────────────────────────────────────────────┘        │
│                                                         │
│  Diff-Based Selection:                                  │
│  ┌─────────────────────────────────────────────┐        │
│  │ touchfiles.ts declares per-test deps        │        │
│  │ git diff → match → run only affected tests  │        │
│  │ Global touchfiles → trigger ALL tests       │        │
│  │ Saves ~75% of eval API costs                │        │
│  └─────────────────────────────────────────────┘        │
└─────────────────────────────────────────────────────────┘

Verification

All workflows verified against actual SKILL.md.tmpl templates and test source:

Chapter	Verified Against	Key Claims Checked
5. Skill System	All 14 `*/SKILL.md.tmpl` files	Skill count, roles, phases
6. Template Engine	`scripts/gen-skill-docs.ts`	All 10 placeholders and their resolvers
7. Planning Skills	`plan-{ceo,eng,design}-review/SKILL.md.tmpl`	4 CEO modes, 4 eng sections, 7 design passes
8. Ship & Review	`ship/SKILL.md.tmpl`, `review/SKILL.md.tmpl`	11 ship steps, 2-pass review, checklist
9. QA & Design	`qa-only/SKILL.md.tmpl`, `design-review/SKILL.md.tmpl`	6 QA phases, risk heuristic thresholds
10. Test Infra	`test/helpers/.ts`, `test/.test.ts`	Tier costs, touchfiles, NDJSON parsing

Design decisions

Progressive complexity: Chapters 5-6 explain how skills work, 7-8 show planning → shipping, 9-10 cover testing and validation
Cross-references: Every chapter links back to foundation chapters (1-4) where relevant
Practical patterns: QA chapter includes 4 reusable browser testing patterns
Honest about costs: Test chapter shows exact API costs per tier ($3.85, $0.15)
Cognitive patterns included: Eng review (15 instincts) and design review (12 instincts) documented

Test plan

All skill names and roles match actual SKILL.md.tmpl files
Template placeholders match gen-skill-docs.ts resolver list
Ship pipeline steps match ship/SKILL.md.tmpl workflow
QA phases match {{QA_METHODOLOGY}} partial
Test tier costs match CONTRIBUTING.md
Cross-links to chapters 1-4 (PR docs: tutorial foundation — architecture, browse engine, snapshots, commands #177) use correct relative paths
No modifications to existing files

@ref

…s, commands Add comprehensive onboarding documentation covering gstack's core infrastructure. Five files forming a self-contained tutorial that takes readers from "what is gstack?" to understanding every browse command. - index.md: Tutorial entry point with Mermaid architecture flowchart - 01_architecture.md: Three-layer design, virtual team, project structure - 02_browse_engine.md: Client-server model, lifecycle, security, buffers - 03_snapshot_and_refs.md: Accessibility tree, @ref system, staleness - 04_command_system.md: All 52 commands (read/write/meta), error handling All technical claims verified against source code (commands.ts, snapshot.ts, server.ts, browser-manager.ts, buffers.ts).

… QA, tests Add six chapters completing the gstack tutorial. Covers the workflow layer built on top of the infrastructure documented in the foundation PR. - 05_skill_system.md: Skill anatomy, preamble, 14 roles, data flows - 06_template_engine.md: Placeholder resolution, DRY partials, dev workflow - 07_planning_skills.md: CEO/Eng/Design reviews, cognitive patterns - 08_ship_and_review.md: 11-step ship pipeline, 2-pass review - 09_qa_and_design_review.md: Browser-based QA, design fix loop - 10_test_infrastructure.md: 3-tier validation, eval store, diff selection All workflows verified against SKILL.md.tmpl templates and test infrastructure source (session-runner, llm-judge, eval-store).

johnxie added 2 commits March 18, 2026 01:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: tutorial workflows — skills, templates, planning, shipping, QA, tests#178

docs: tutorial workflows — skills, templates, planning, shipping, QA, tests#178
johnxie wants to merge 2 commits intogarrytan:mainfrom
johnxie:docs/tutorial-workflows

johnxie commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

johnxie commented Mar 18, 2026

Summary

What's included

The complete tutorial map

Chapter highlights

Chapter 5: Skill System

Chapter 6: Template Engine

Chapter 7: Planning Skills

Chapter 8: Ship & Review Pipeline

Chapter 9: QA & Design Review

Chapter 10: Test Infrastructure

Verification

Design decisions

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant