Skip to content

feat: /benchmark — catch performance regressions before users feel them#153

Open
HMAKT99 wants to merge 3 commits intogarrytan:mainfrom
HMAKT99:arun/benchmark-skill
Open

feat: /benchmark — catch performance regressions before users feel them#153
HMAKT99 wants to merge 3 commits intogarrytan:mainfrom
HMAKT99:arun/benchmark-skill

Conversation

@HMAKT99
Copy link

@HMAKT99 HMAKT99 commented Mar 18, 2026

Performance dies by a thousand paper cuts

No single PR makes the app slow. But every PR adds 50ms here, 20KB there. In 6 weeks the landing page takes 4 seconds to load and nobody can point to when it happened. The boiling frog problem.

/benchmark gives you the thermometer. Capture baselines, measure after every PR, catch regressions before they compound. Uses the $B perf command that's been in gstack since day one — but nobody's been using systematically.

What it looks like

You:   /benchmark https://myapp.com --baseline
Claude: Baseline captured for 5 pages. Deploy your changes, then run /benchmark.

You:   [make changes, deploy]

You:   /benchmark https://myapp.com
Claude: PERFORMANCE REPORT — myapp.com
        Branch: feature-xyz vs baseline (main)
        ═══════════════════════════════════════════════════
        Metric          Baseline    Current     Delta    Status
        ─────           ────────    ───────     ─────    ──────
        TTFB            120ms       135ms       +15ms    OK
        FCP             450ms       480ms       +30ms    OK
        LCP             800ms       1600ms      +800ms   REGRESSION ←
        DOM Complete    1200ms      1350ms      +150ms   WARNING
        JS Bundle       450KB       720KB       +270KB   REGRESSION ←
        CSS Bundle      85KB        88KB        +3KB     OK
        Requests        42          58          +16      WARNING

        REGRESSIONS DETECTED: 2
        [1] LCP doubled (800ms → 1600ms)
            Likely cause: new hero image loaded synchronously
        [2] JS bundle +60% (450KB → 720KB)
            Likely cause: new dependency or missing tree-shaking

        TOP 5 SLOWEST RESOURCES:
        #  Resource              Size     Duration
        1  vendor.chunk.js      320KB    480ms    ← consider code-splitting
        2  main.js              250KB    320ms
        3  hero-image.webp      180KB    280ms    ← add width/height, lazy load
        4  analytics.js          45KB    250ms    ← load async
        5  inter-var.woff2       95KB    180ms

        PERFORMANCE BUDGET:
        FCP < 1.8s    PASS    (0.48s)
        LCP < 2.5s    PASS    (1.6s)
        JS  < 500KB   FAIL    (720KB)  ←
        CSS < 100KB   PASS    (88KB)
        Grade: B (was A)

What it measures

Real data from performance.getEntries() — not estimates, not Lighthouse scores, actual browser timing:

Metric Source Regression Threshold
TTFB Navigation Timing API >50% or >500ms increase
FCP Paint Timing API >50% or >500ms increase
LCP Largest Contentful Paint >50% or >500ms increase
DOM Complete Navigation Timing API >50% or >500ms increase
JS bundle size Resource Timing API >25% increase
CSS bundle size Resource Timing API >25% increase
Request count Resource Timing API >30% increase

The $B perf command finally gets a home

gstack has had $B perf since v0.1. It returns page load performance data. But no skill uses it systematically:

  • /qa checks for visual bugs, not performance
  • /review checks code quality, not runtime speed
  • /ship runs tests, not benchmarks

/benchmark is the skill that $B perf was waiting for.

Features

  • Baselines: Capture before deploying, compare after. Baselines are JSON — diffable, trackable, reviewable.
  • Trend analysis: /benchmark --trend shows performance over time from historical data. Spot the week things started getting slow.
  • Resource waterfall: Top 10 slowest resources with specific fix recommendations (code-split, lazy-load, async, compress).
  • Performance budget: Grade against industry standards (FCP < 1.8s, LCP < 2.5s, JS < 500KB).
  • Diff-aware: /benchmark --diff only benchmarks pages affected by current branch changes.

Arguments

/benchmark <url>              — full benchmark with baseline comparison
/benchmark <url> --baseline   — capture baseline (run before changes)
/benchmark <url> --quick      — single-pass timing (no baseline needed)
/benchmark <url> --pages ...  — specific pages
/benchmark --diff             — benchmark pages affected by current branch
/benchmark --trend            — show historical performance trends

This slots into the engineering workflow

/plan-eng-review    → what to build
/review             → is the code correct?
/benchmark          → is it fast?        ← NEW
/a11y               → is it accessible?  ← NEW
/ship               → push it
/canary             → did it break?      ← NEW
/qa                 → full QA pass

Test plan

  • bun test — all tests pass, 0 failures
  • bun run gen:skill-docs --dry-run — FRESH
  • Uses {{PREAMBLE}} + {{BROWSE_SETUP}} — follows template pipeline
  • Reports saved to .gstack/benchmark-reports/

…wse daemon

Catches the death-by-a-thousand-cuts performance decay:
- Before/after comparison using browse daemon's perf command
- Core Web Vitals tracking (TTFB, FCP, LCP, DOM Complete)
- JS/CSS bundle size monitoring with regression thresholds
- Resource waterfall analysis with optimization recommendations
- Performance budget checking against industry standards
- Trend analysis from historical benchmark data
- Diff-aware mode: only benchmark pages affected by current branch
@HMAKT99 HMAKT99 changed the title feat: add /benchmark — performance regression detection via browse daemon feat: /benchmark — catch performance regressions before users feel them Mar 18, 2026
HMAKT99 added 2 commits March 18, 2026 10:44
Per maintainer feedback: generated .md files should not be committed.
Only the .tmpl template is source of truth. Build generates the .md.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant