feat: /benchmark — catch performance regressions before users feel them#153
Open
HMAKT99 wants to merge 3 commits intogarrytan:mainfrom
Open
feat: /benchmark — catch performance regressions before users feel them#153HMAKT99 wants to merge 3 commits intogarrytan:mainfrom
HMAKT99 wants to merge 3 commits intogarrytan:mainfrom
Conversation
…wse daemon Catches the death-by-a-thousand-cuts performance decay: - Before/after comparison using browse daemon's perf command - Core Web Vitals tracking (TTFB, FCP, LCP, DOM Complete) - JS/CSS bundle size monitoring with regression thresholds - Resource waterfall analysis with optimization recommendations - Performance budget checking against industry standards - Trend analysis from historical benchmark data - Diff-aware mode: only benchmark pages affected by current branch
Per maintainer feedback: generated .md files should not be committed. Only the .tmpl template is source of truth. Build generates the .md.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Performance dies by a thousand paper cuts
No single PR makes the app slow. But every PR adds 50ms here, 20KB there. In 6 weeks the landing page takes 4 seconds to load and nobody can point to when it happened. The boiling frog problem.
/benchmarkgives you the thermometer. Capture baselines, measure after every PR, catch regressions before they compound. Uses the$B perfcommand that's been in gstack since day one — but nobody's been using systematically.What it looks like
What it measures
Real data from
performance.getEntries()— not estimates, not Lighthouse scores, actual browser timing:The
$B perfcommand finally gets a homegstack has had
$B perfsince v0.1. It returns page load performance data. But no skill uses it systematically:/qachecks for visual bugs, not performance/reviewchecks code quality, not runtime speed/shipruns tests, not benchmarks/benchmarkis the skill that$B perfwas waiting for.Features
/benchmark --trendshows performance over time from historical data. Spot the week things started getting slow./benchmark --diffonly benchmarks pages affected by current branch changes.Arguments
This slots into the engineering workflow
Test plan
bun test— all tests pass, 0 failuresbun run gen:skill-docs --dry-run— FRESH{{PREAMBLE}}+{{BROWSE_SETUP}}— follows template pipeline.gstack/benchmark-reports/