Skip to content

docs: tutorial workflows — skills, templates, planning, shipping, QA, tests#178

Open
johnxie wants to merge 2 commits intogarrytan:mainfrom
johnxie:docs/tutorial-workflows
Open

docs: tutorial workflows — skills, templates, planning, shipping, QA, tests#178
johnxie wants to merge 2 commits intogarrytan:mainfrom
johnxie:docs/tutorial-workflows

Conversation

@johnxie
Copy link

@johnxie johnxie commented Mar 18, 2026

Summary

Adds the second half of the gstack tutorial — six chapters covering the workflow layer built on top of the infrastructure documented in PR #177. Together, the 10 chapters form a complete onboarding guide from "what is gstack?" to "how does it test itself?"

Depends on: PR #177 (docs/tutorial-foundation) — chapters 1-4 covering architecture, browse engine, snapshots, and commands.

What's included

File Lines Topic Key Visuals
docs/05_skill_system.md 309 Skill anatomy: preamble, 14 roles, design rules, data flows Skill data flow diagram, template pipeline
docs/06_template_engine.md 330 Template engine: placeholders, resolvers, DRY partials Build pipeline flow, resolver architecture
docs/07_planning_skills.md 241 CEO, Eng Manager, Designer plan reviews 3-review sequence, interactive flow, review dashboard
docs/08_ship_and_review.md 284 11-step ship pipeline, 2-pass code review Ship flowchart, review sequence, ship-review integration
docs/09_qa_and_design_review.md 332 Browser-based QA, design fix loop with risk heuristic QA phases, fix loop, design risk accumulator
docs/10_test_infrastructure.md 519 3-tier testing, eval store, diff-based selection Tier diagram, E2E sequence, observability pipeline

Total: ~2,015 lines across 6 files, ~68KB

The complete tutorial map

With both PRs merged, readers get a 10-chapter guided tour:

                        gstack Tutorial
                    ═══════════════════════

    PR #177 (Foundation)              This PR (Workflows)
    ════════════════════              ═══════════════════

    ┌─────────────────┐              ┌──────────────────┐
    │  1. Architecture │──────────────│  5. Skill System │
    │     Overview     │              │     14 roles,    │
    │                  │              │     anatomy,     │
    │  "What is gstack │              │     data flows   │
    │   and why?"      │              └────────┬─────────┘
    └────────┬─────────┘                       │
             │                        ┌────────▼─────────┐
    ┌────────▼─────────┐              │  6. Template     │
    │  2. Browse Engine │              │     Engine       │
    │     Persistent    │              │     Placeholders,│
    │     Chromium      │              │     resolvers    │
    │     daemon        │              └────────┬─────────┘
    └────────┬─────────┘                       │
             │                        ┌────────▼─────────┐
    ┌────────▼─────────┐              │  7. Planning     │
    │  3. Snapshot &    │              │     Skills       │
    │     Ref System    │              │     CEO → Eng →  │
    │     @e1, @e2, ... │              │     Design       │
    └────────┬─────────┘              └────────┬─────────┘
             │                                 │
    ┌────────▼─────────┐              ┌────────▼─────────┐
    │  4. Command       │              │  8. Ship &       │
    │     System        │──────────────│     Review       │
    │     52 commands   │              │     11-step pipe │
    └──────────────────┘              └────────┬─────────┘
                                               │
                                      ┌────────▼─────────┐
                                      │  9. QA & Design  │
                                      │     Review       │
                                      │     Browser QA,  │
                                      │     fix loops    │
                                      └────────┬─────────┘
                                               │
                                      ┌────────▼─────────┐
                                      │ 10. Test Infra   │
                                      │     3-tier:      │
                                      │     Static →     │
                                      │     E2E → Judge  │
                                      └──────────────────┘

Chapter highlights

Chapter 5: Skill System

How 14 Markdown files turn Claude Code into a virtual engineering team.

flowchart LR
    TMPL["SKILL.md.tmpl\n(human-written)"]
    GEN["gen-skill-docs\n(build step)"]
    MD["SKILL.md\n(generated)"]
    CLAUDE["Claude Code\n(reads at runtime)"]

    TMPL --> GEN --> MD --> CLAUDE
Loading

Key content:

  • Skill anatomy: preamble → role → browse setup → workflow → decisions → completion
  • All 14 skills mapped to phases (Planning → Build → Ship → Observability)
  • Skill design rules: independent bash blocks, no hardcoded branches, English conditionals
  • Cognitive patterns: 15 engineering instincts, 12 design thinking patterns
  • Data flows: test plan artifacts, review dashboard, cross-skill consumption

Chapter 6: Template Engine

The build system that keeps skill docs in sync with source code.

Placeholder Source Used By
{{PREAMBLE}} gen-skill-docs.ts All 14 skills
{{BROWSE_SETUP}} gen-skill-docs.ts browse, qa, qa-only, design-review
{{COMMAND_REFERENCE}} commands.ts browse
{{SNAPSHOT_FLAGS}} snapshot.ts browse
{{QA_METHODOLOGY}} gen-skill-docs.ts qa, qa-only
{{DESIGN_METHODOLOGY}} gen-skill-docs.ts plan-design-review, design-review
{{TEST_BOOTSTRAP}} gen-skill-docs.ts qa, ship, design-review

Chapter 7: Planning Skills

Three reviews from three perspectives — before writing any code:

┌──────────────────────────────────────────────────────────┐
│                    Your Plan (PLAN.md)                    │
└──────────┬──────────────────┬──────────────────┬─────────┘
           │                  │                  │
    ┌──────▼──────┐    ┌──────▼──────┐    ┌──────▼──────┐
    │ CEO Review  │    │ Eng Review  │    │Design Review│
    │             │    │             │    │             │
    │ Right       │    │ Right       │    │ Right       │
    │ problem?    │    │ architecture│    │ experience? │
    │             │    │             │    │             │
    │ 4 modes:    │    │ 4 sections: │    │ 7 passes:   │
    │ • Expand    │    │ • Arch      │    │ • Info arch │
    │ • Selective │    │ • Quality   │    │ • States    │
    │ • Hold      │    │ • Tests     │    │ • Journey   │
    │ • Reduce    │    │ • Perf      │    │ • AI slop   │
    │             │    │             │    │ • Design sys│
    │ Output:     │    │ Output:     │    │ • Responsive│
    │ Scope       │    │ Test plan   │    │ • Decisions │
    │ decisions   │    │ artifact    │    │             │
    └─────────────┘    └─────────────┘    └─────────────┘

Chapter 8: Ship & Review Pipeline

The 11-step automated shipping workflow:

Step  1: Pre-flight ──────── branch, context, review dashboard
Step  2: Merge base ──────── fetch + merge for pre-merge testing
Step  3: Run tests ───────── parallel test suites
Step  4: Eval suites ─────── conditional: if prompt files changed
Step  5: Coverage audit ──── find gaps, generate tests
Step  6: Pre-landing review ─ checklist + design-lite
Step  7: Greptile ────────── classify + escalate comments
Step  8: Version bump ────── MICRO/PATCH auto, MINOR/MAJOR ask
Step  9: CHANGELOG ────────── auto-generate, user-focused voice
Step 10: Bisect commits ──── ordered, independently revertable
Step 11: Push + PR ────────── comprehensive body with dashboard

Chapter 9: QA & Design Review

Browser-based testing with real clicks and screenshots:

flowchart TD
    subgraph QA["QA Methodology (6 phases)"]
        P1["1. Initialize"] --> P2["2. Authenticate"]
        P2 --> P3["3. Orient (map structure)"]
        P3 --> P4["4. Explore (visit pages)"]
        P4 --> P5["5. Document issues"]
        P5 --> P6["6. Health score"]
    end

    subgraph Fix["Fix Loop (/qa only)"]
        F1["Locate source"] --> F2["Apply fix"]
        F2 --> F3["Re-test with browser"]
        F3 --> F4["Atomic commit"]
        F4 --> F5["Regression test"]
    end

    P6 --> Fix
Loading

Design review risk heuristic:

Trigger Risk Increase Hard Stop
Each revert +15% Risk > 20% → stop
Each component file change +5% Fix count > 30 → hard cap
After fix 10 +1% per additional
Touching unrelated files +20%

Chapter 10: Test Infrastructure

The 3-tier system that validates gstack itself:

┌─────────────────────────────────────────────────────────┐
│              gstack Test Infrastructure                  │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  Tier 1: Static Validation          FREE    <2 seconds  │
│  ┌─────────────────────────────────────────────┐        │
│  │ • Parse every $B command in every SKILL.md  │        │
│  │ • Validate against commands.ts registry     │        │
│  │ • Check snapshot flags vs SNAPSHOT_FLAGS     │        │
│  │ • Detect hardcoded branch names             │        │
│  │ • Verify template freshness                 │        │
│  └─────────────────────────────────────────────┘        │
│           │ pass                                        │
│           ▼                                             │
│  Tier 2: E2E Testing              ~$3.85   ~20 minutes  │
│  ┌─────────────────────────────────────────────┐        │
│  │ • Spawn real Claude sessions (claude -p)    │        │
│  │ • Stream NDJSON for tool-by-tool progress   │        │
│  │ • Scan for browse errors in transcript      │        │
│  │ • Planted-bug detection testing             │        │
│  │ • Diff-based selection (only changed tests) │        │
│  └─────────────────────────────────────────────┘        │
│           │ pass                                        │
│           ▼                                             │
│  Tier 3: LLM-as-Judge              ~$0.15   ~30 seconds │
│  ┌─────────────────────────────────────────────┐        │
│  │ • Score clarity/completeness/actionability  │        │
│  │ • Planted-bug outcome evaluation            │        │
│  │ • Uses claude-sonnet-4-6 for stability      │        │
│  │ • Threshold: every dimension ≥ 4/5          │        │
│  └─────────────────────────────────────────────┘        │
│                                                         │
│  Diff-Based Selection:                                  │
│  ┌─────────────────────────────────────────────┐        │
│  │ touchfiles.ts declares per-test deps        │        │
│  │ git diff → match → run only affected tests  │        │
│  │ Global touchfiles → trigger ALL tests       │        │
│  │ Saves ~75% of eval API costs                │        │
│  └─────────────────────────────────────────────┘        │
└─────────────────────────────────────────────────────────┘

Verification

All workflows verified against actual SKILL.md.tmpl templates and test source:

Chapter Verified Against Key Claims Checked
5. Skill System All 14 */SKILL.md.tmpl files Skill count, roles, phases
6. Template Engine scripts/gen-skill-docs.ts All 10 placeholders and their resolvers
7. Planning Skills plan-{ceo,eng,design}-review/SKILL.md.tmpl 4 CEO modes, 4 eng sections, 7 design passes
8. Ship & Review ship/SKILL.md.tmpl, review/SKILL.md.tmpl 11 ship steps, 2-pass review, checklist
9. QA & Design qa-only/SKILL.md.tmpl, design-review/SKILL.md.tmpl 6 QA phases, risk heuristic thresholds
10. Test Infra test/helpers/*.ts, test/*.test.ts Tier costs, touchfiles, NDJSON parsing

Design decisions

  • Progressive complexity: Chapters 5-6 explain how skills work, 7-8 show planning → shipping, 9-10 cover testing and validation
  • Cross-references: Every chapter links back to foundation chapters (1-4) where relevant
  • Practical patterns: QA chapter includes 4 reusable browser testing patterns
  • Honest about costs: Test chapter shows exact API costs per tier ($3.85, $0.15)
  • Cognitive patterns included: Eng review (15 instincts) and design review (12 instincts) documented

Test plan

  • All skill names and roles match actual SKILL.md.tmpl files
  • Template placeholders match gen-skill-docs.ts resolver list
  • Ship pipeline steps match ship/SKILL.md.tmpl workflow
  • QA phases match {{QA_METHODOLOGY}} partial
  • Test tier costs match CONTRIBUTING.md
  • Cross-links to chapters 1-4 (PR docs: tutorial foundation — architecture, browse engine, snapshots, commands #177) use correct relative paths
  • No modifications to existing files

johnxie added 2 commits March 18, 2026 01:56
…s, commands

Add comprehensive onboarding documentation covering gstack's core
infrastructure. Five files forming a self-contained tutorial that takes
readers from "what is gstack?" to understanding every browse command.

- index.md: Tutorial entry point with Mermaid architecture flowchart
- 01_architecture.md: Three-layer design, virtual team, project structure
- 02_browse_engine.md: Client-server model, lifecycle, security, buffers
- 03_snapshot_and_refs.md: Accessibility tree, @ref system, staleness
- 04_command_system.md: All 52 commands (read/write/meta), error handling

All technical claims verified against source code (commands.ts,
snapshot.ts, server.ts, browser-manager.ts, buffers.ts).
… QA, tests

Add six chapters completing the gstack tutorial. Covers the workflow
layer built on top of the infrastructure documented in the foundation PR.

- 05_skill_system.md: Skill anatomy, preamble, 14 roles, data flows
- 06_template_engine.md: Placeholder resolution, DRY partials, dev workflow
- 07_planning_skills.md: CEO/Eng/Design reviews, cognitive patterns
- 08_ship_and_review.md: 11-step ship pipeline, 2-pass review
- 09_qa_and_design_review.md: Browser-based QA, design fix loop
- 10_test_infrastructure.md: 3-tier validation, eval store, diff selection

All workflows verified against SKILL.md.tmpl templates and
test infrastructure source (session-runner, llm-judge, eval-store).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant