Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,18 @@
# Changelog

## [0.6.5.0] - 2026-03-18

### Fixed

- **Review log no longer breaks on branch names with `/`.** Branch names like `garrytan/design-system` caused review log writes to fail because Claude Code runs multi-line bash blocks as separate shell invocations, losing the sanitized `$BRANCH` variable between commands. New `gstack-review-log` and `gstack-review-read` atomic helpers encapsulate the entire operation in a single command — slug detection, directory creation, and file I/O all happen in one shell.
- **All `eval $(gstack-slug)` blocks across 12 skill templates now use `&&` chaining** to prevent variable loss, even in templates that only need `mkdir`.

### For contributors

- Added `bin/gstack-review-log` and `bin/gstack-review-read` helpers with `GSTACK_HOME` env var override for testability.
- Full integration tests: temp-dir file creation, append behavior, slash sanitization, `NO_REVIEWS` fallback, `---CONFIG---` separator.
- Regression tests verify generated SKILL.md files use the new helpers.

## [0.6.4.0] - 2026-03-17

### Added
Expand Down
23 changes: 23 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,17 @@ on `git diff` against the base branch. Each test declares its file dependencies
llm-judge, gen-skill-docs) trigger all tests. Use `EVALS_ALL=1` or the `:all` script
variants to force all tests. Run `eval:select` to preview which tests would run.

## Testing

```bash
bun test # run before every commit — free, <2s
bun run test:evals # run before shipping — paid, diff-based (~$4/run max)
```

`bun test` runs skill validation, gen-skill-docs quality checks, and browse
integration tests. `bun run test:evals` runs LLM-judge quality evals and E2E
tests via `claude -p`. Both must pass before creating a PR.

## Project structure

```
Expand Down Expand Up @@ -94,6 +105,18 @@ Rules:
- **Express conditionals as English.** Instead of nested `if/elif/else` in bash,
write numbered decision steps: "1. If X, do Y. 2. Otherwise, do Z."

## Platform-agnostic design

Skills must NEVER hardcode framework-specific commands, file patterns, or directory
structures. Instead:

1. **Read CLAUDE.md** for project-specific config (test commands, eval commands, etc.)
2. **If missing, AskUserQuestion** — let the user tell you or let gstack search the repo
3. **Persist the answer to CLAUDE.md** so we never have to ask again

This applies to test commands, eval commands, deploy commands, and any other
project-specific behavior. The project owns its config; gstack reads it.

## Browser interaction

When you need to interact with a browser (QA, dogfooding, cookie setup), use the
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.6.4.0
0.6.5.0
9 changes: 9 additions & 0 deletions bin/gstack-review-log
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/usr/bin/env bash
# gstack-review-log — atomically log a review result
# Usage: gstack-review-log '{"skill":"...","timestamp":"...","status":"..."}'
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
eval $("$SCRIPT_DIR/gstack-slug" 2>/dev/null)
GSTACK_HOME="${GSTACK_HOME:-$HOME/.gstack}"
mkdir -p "$GSTACK_HOME/projects/$SLUG"
echo "$1" >> "$GSTACK_HOME/projects/$SLUG/$BRANCH-reviews.jsonl"
10 changes: 10 additions & 0 deletions bin/gstack-review-read
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/usr/bin/env bash
# gstack-review-read — read review log and config for dashboard
# Usage: gstack-review-read
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
eval $("$SCRIPT_DIR/gstack-slug" 2>/dev/null)
GSTACK_HOME="${GSTACK_HOME:-$HOME/.gstack}"
cat "$GSTACK_HOME/projects/$SLUG/$BRANCH-reviews.jsonl" 2>/dev/null || echo "NO_REVIEWS"
echo "---CONFIG---"
"$SCRIPT_DIR/gstack-config" get skip_eng_review 2>/dev/null || echo "false"
3 changes: 1 addition & 2 deletions design-consultation/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,8 +153,7 @@ ls src/ app/ pages/ components/ 2>/dev/null | head -30
Look for brainstorm output:

```bash
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
ls ~/.gstack/projects/$SLUG/*brainstorm* 2>/dev/null | head -5
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) && ls ~/.gstack/projects/$SLUG/*brainstorm* 2>/dev/null | head -5
ls .context/*brainstorm* .context/attachments/*brainstorm* 2>/dev/null | head -5
```

Expand Down
3 changes: 1 addition & 2 deletions design-consultation/SKILL.md.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,7 @@ ls src/ app/ pages/ components/ 2>/dev/null | head -30
Look for brainstorm output:

```bash
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
ls ~/.gstack/projects/$SLUG/*brainstorm* 2>/dev/null | head -5
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) && ls ~/.gstack/projects/$SLUG/*brainstorm* 2>/dev/null | head -5
ls .context/*brainstorm* .context/attachments/*brainstorm* 2>/dev/null | head -5
```

Expand Down
3 changes: 1 addition & 2 deletions design-review/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -810,8 +810,7 @@ Write the report to both local and project-scoped locations:

**Project-scoped:**
```bash
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
mkdir -p ~/.gstack/projects/$SLUG
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) && mkdir -p ~/.gstack/projects/$SLUG
```
Write to `~/.gstack/projects/{slug}/{user}-{branch}-design-audit-{datetime}.md`

Expand Down
3 changes: 1 addition & 2 deletions design-review/SKILL.md.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -208,8 +208,7 @@ Write the report to both local and project-scoped locations:

**Project-scoped:**
```bash
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
mkdir -p ~/.gstack/projects/$SLUG
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) && mkdir -p ~/.gstack/projects/$SLUG
```
Write to `~/.gstack/projects/{slug}/{user}-{branch}-design-audit-{datetime}.md`

Expand Down
36 changes: 15 additions & 21 deletions plan-ceo-review/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,8 +212,8 @@ Run the following commands:
git log --oneline -30 # Recent history
git diff <base> --stat # What's already changed
git stash list # Any stashed work
grep -r "TODO\|FIXME\|HACK\|XXX" --include="*.rb" --include="*.js" -l
find . -name "*.rb" -newer Gemfile.lock | head -20 # Recently touched files
grep -r "TODO\|FIXME\|HACK\|XXX" -l --exclude-dir=node_modules --exclude-dir=vendor --exclude-dir=.git | head -30
git log --since=30.days --name-only --format="" | sort | uniq -c | sort -rn | head -20 # Recently touched files
```
Then read CLAUDE.md, TODOS.md, and any existing architecture docs. When reading TODOS.md, specifically:
* Note any TODOs this plan touches, blocks, or unlocks
Expand Down Expand Up @@ -284,8 +284,7 @@ Describe the ideal end state of this system 12 months from now. Does this plan m
After the opt-in/cherry-pick ceremony, write the plan to disk so the vision and decisions survive beyond this conversation. Only run this step for EXPANSION and SELECTIVE EXPANSION modes.

```bash
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
mkdir -p ~/.gstack/projects/$SLUG/ceo-plans
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) && mkdir -p ~/.gstack/projects/$SLUG/ceo-plans
```

Before writing, check for existing CEO plans in the ceo-plans/ directory. If any are >30 days old or their branch has been merged/deleted, offer to archive them:
Expand Down Expand Up @@ -398,24 +397,24 @@ For every new method, service, or codepath that can fail, fill in this table:
```
METHOD/CODEPATH | WHAT CAN GO WRONG | EXCEPTION CLASS
-------------------------|-----------------------------|-----------------
ExampleService#call | API timeout | Faraday::TimeoutError
ExampleService.call | API timeout | TimeoutError
| API returns 429 | RateLimitError
| API returns malformed JSON | JSON::ParserError
| DB connection pool exhausted| ActiveRecord::ConnectionTimeoutError
| Record not found | ActiveRecord::RecordNotFound
| API returns malformed JSON | JSONParseError
| DB connection pool exhausted| ConnectionPoolExhausted
| Record not found | RecordNotFound
-------------------------|-----------------------------|-----------------

EXCEPTION CLASS | RESCUED? | RESCUE ACTION | USER SEES
-----------------------------|-----------|------------------------|------------------
Faraday::TimeoutError | Y | Retry 2x, then raise | "Service temporarily unavailable"
TimeoutError | Y | Retry 2x, then raise | "Service temporarily unavailable"
RateLimitError | Y | Backoff + retry | Nothing (transparent)
JSON::ParserError | N ← GAP | — | 500 error ← BAD
ConnectionTimeoutError | N ← GAP | — | 500 error ← BAD
ActiveRecord::RecordNotFound | Y | Return nil, log warning | "Not found" message
JSONParseError | N ← GAP | — | 500 error ← BAD
ConnectionPoolExhausted | N ← GAP | — | 500 error ← BAD
RecordNotFound | Y | Return nil, log warning | "Not found" message
```
Rules for this section:
* `rescue StandardError` is ALWAYS a smell. Name the specific exceptions.
* `rescue => e` with only `Rails.logger.error(e.message)` is insufficient. Log the full context: what was being attempted, with what arguments, for what user/request.
* Catching all errors generically (`rescue StandardError`, `catch (Exception e)`, `except Exception`) is ALWAYS a smell. Name the specific exceptions.
* Logging only the error message is insufficient. Log the full context: what was being attempted, with what arguments, for what user/request.
* Every rescued error must either: retry with backoff, degrade gracefully with a user-visible message, or re-raise with added context. "Swallow and continue" is almost never acceptable.
* For each GAP (unrescued error that should be rescued): specify the rescue action and what the user should see.
* For LLM/AI service calls specifically: what happens when the response is malformed? When it's empty? When it hallucinates invalid JSON? When the model returns a refusal? Each of these is a distinct failure mode.
Expand Down Expand Up @@ -712,9 +711,7 @@ If any AskUserQuestion goes unanswered, note it here. Never silently default.
After producing the Completion Summary above, persist the review result:

```bash
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
mkdir -p ~/.gstack/projects/$SLUG
echo '{"skill":"plan-ceo-review","timestamp":"TIMESTAMP","status":"STATUS","unresolved":N,"critical_gaps":N,"mode":"MODE"}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-ceo-review","timestamp":"TIMESTAMP","status":"STATUS","unresolved":N,"critical_gaps":N,"mode":"MODE"}'
```

Before running this command, substitute the placeholder values from the Completion Summary you just produced:
Expand All @@ -729,10 +726,7 @@ Before running this command, substitute the placeholder values from the Completi
After completing the review, read the review log and config to display the dashboard.

```bash
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
cat ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl 2>/dev/null || echo "NO_REVIEWS"
echo "---CONFIG---"
~/.claude/skills/gstack/bin/gstack-config get skip_eng_review 2>/dev/null || echo "false"
~/.claude/skills/gstack/bin/gstack-review-read
```

Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite). Ignore entries with timestamps older than 7 days. For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display:
Expand Down
31 changes: 14 additions & 17 deletions plan-ceo-review/SKILL.md.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -91,8 +91,8 @@ Run the following commands:
git log --oneline -30 # Recent history
git diff <base> --stat # What's already changed
git stash list # Any stashed work
grep -r "TODO\|FIXME\|HACK\|XXX" --include="*.rb" --include="*.js" -l
find . -name "*.rb" -newer Gemfile.lock | head -20 # Recently touched files
grep -r "TODO\|FIXME\|HACK\|XXX" -l --exclude-dir=node_modules --exclude-dir=vendor --exclude-dir=.git | head -30
git log --since=30.days --name-only --format="" | sort | uniq -c | sort -rn | head -20 # Recently touched files
```
Then read CLAUDE.md, TODOS.md, and any existing architecture docs. When reading TODOS.md, specifically:
* Note any TODOs this plan touches, blocks, or unlocks
Expand Down Expand Up @@ -163,8 +163,7 @@ Describe the ideal end state of this system 12 months from now. Does this plan m
After the opt-in/cherry-pick ceremony, write the plan to disk so the vision and decisions survive beyond this conversation. Only run this step for EXPANSION and SELECTIVE EXPANSION modes.

```bash
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
mkdir -p ~/.gstack/projects/$SLUG/ceo-plans
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null) && mkdir -p ~/.gstack/projects/$SLUG/ceo-plans
```

Before writing, check for existing CEO plans in the ceo-plans/ directory. If any are >30 days old or their branch has been merged/deleted, offer to archive them:
Expand Down Expand Up @@ -277,24 +276,24 @@ For every new method, service, or codepath that can fail, fill in this table:
```
METHOD/CODEPATH | WHAT CAN GO WRONG | EXCEPTION CLASS
-------------------------|-----------------------------|-----------------
ExampleService#call | API timeout | Faraday::TimeoutError
ExampleService.call | API timeout | TimeoutError
| API returns 429 | RateLimitError
| API returns malformed JSON | JSON::ParserError
| DB connection pool exhausted| ActiveRecord::ConnectionTimeoutError
| Record not found | ActiveRecord::RecordNotFound
| API returns malformed JSON | JSONParseError
| DB connection pool exhausted| ConnectionPoolExhausted
| Record not found | RecordNotFound
-------------------------|-----------------------------|-----------------

EXCEPTION CLASS | RESCUED? | RESCUE ACTION | USER SEES
-----------------------------|-----------|------------------------|------------------
Faraday::TimeoutError | Y | Retry 2x, then raise | "Service temporarily unavailable"
TimeoutError | Y | Retry 2x, then raise | "Service temporarily unavailable"
RateLimitError | Y | Backoff + retry | Nothing (transparent)
JSON::ParserError | N ← GAP | — | 500 error ← BAD
ConnectionTimeoutError | N ← GAP | — | 500 error ← BAD
ActiveRecord::RecordNotFound | Y | Return nil, log warning | "Not found" message
JSONParseError | N ← GAP | — | 500 error ← BAD
ConnectionPoolExhausted | N ← GAP | — | 500 error ← BAD
RecordNotFound | Y | Return nil, log warning | "Not found" message
```
Rules for this section:
* `rescue StandardError` is ALWAYS a smell. Name the specific exceptions.
* `rescue => e` with only `Rails.logger.error(e.message)` is insufficient. Log the full context: what was being attempted, with what arguments, for what user/request.
* Catching all errors generically (`rescue StandardError`, `catch (Exception e)`, `except Exception`) is ALWAYS a smell. Name the specific exceptions.
* Logging only the error message is insufficient. Log the full context: what was being attempted, with what arguments, for what user/request.
* Every rescued error must either: retry with backoff, degrade gracefully with a user-visible message, or re-raise with added context. "Swallow and continue" is almost never acceptable.
* For each GAP (unrescued error that should be rescued): specify the rescue action and what the user should see.
* For LLM/AI service calls specifically: what happens when the response is malformed? When it's empty? When it hallucinates invalid JSON? When the model returns a refusal? Each of these is a distinct failure mode.
Expand Down Expand Up @@ -591,9 +590,7 @@ If any AskUserQuestion goes unanswered, note it here. Never silently default.
After producing the Completion Summary above, persist the review result:

```bash
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
mkdir -p ~/.gstack/projects/$SLUG
echo '{"skill":"plan-ceo-review","timestamp":"TIMESTAMP","status":"STATUS","unresolved":N,"critical_gaps":N,"mode":"MODE"}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-ceo-review","timestamp":"TIMESTAMP","status":"STATUS","unresolved":N,"critical_gaps":N,"mode":"MODE"}'
```

Before running this command, substitute the placeholder values from the Completion Summary you just produced:
Expand Down
9 changes: 2 additions & 7 deletions plan-design-review/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -387,9 +387,7 @@ If any AskUserQuestion goes unanswered, note it here. Never silently default to
After producing the Completion Summary above, persist the review result:

```bash
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
mkdir -p ~/.gstack/projects/$SLUG
echo '{"skill":"plan-design-review","timestamp":"TIMESTAMP","status":"STATUS","overall_score":N,"unresolved":N,"decisions_made":N}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-design-review","timestamp":"TIMESTAMP","status":"STATUS","overall_score":N,"unresolved":N,"decisions_made":N}'
```

Substitute values from the Completion Summary:
Expand All @@ -404,10 +402,7 @@ Substitute values from the Completion Summary:
After completing the review, read the review log and config to display the dashboard.

```bash
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
cat ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl 2>/dev/null || echo "NO_REVIEWS"
echo "---CONFIG---"
~/.claude/skills/gstack/bin/gstack-config get skip_eng_review 2>/dev/null || echo "false"
~/.claude/skills/gstack/bin/gstack-review-read
```

Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, plan-design-review, design-review-lite). Ignore entries with timestamps older than 7 days. For Design Review, show whichever is more recent between `plan-design-review` (full visual audit) and `design-review-lite` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. Display:
Expand Down
4 changes: 1 addition & 3 deletions plan-design-review/SKILL.md.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -266,9 +266,7 @@ If any AskUserQuestion goes unanswered, note it here. Never silently default to
After producing the Completion Summary above, persist the review result:

```bash
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
mkdir -p ~/.gstack/projects/$SLUG
echo '{"skill":"plan-design-review","timestamp":"TIMESTAMP","status":"STATUS","overall_score":N,"unresolved":N,"decisions_made":N}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"plan-design-review","timestamp":"TIMESTAMP","status":"STATUS","overall_score":N,"unresolved":N,"decisions_made":N}'
```

Substitute values from the Completion Summary:
Expand Down
Loading