Skip to content

feat: phased decision report for orchestrator explainability#48

Open
WellDunDun wants to merge 33 commits intomasterfrom
custom/prefix/router-1773508846626
Open

feat: phased decision report for orchestrator explainability#48
WellDunDun wants to merge 33 commits intomasterfrom
custom/prefix/router-1773508846626

Conversation

@WellDunDun
Copy link
Collaborator

Summary

  • Adds formatOrchestrateReport() — a 5-phase human-readable decision report showing sync sources, status breakdown, per-skill decisions with reasons, evolution results with validation pass-rate changes, and watch outcomes including alerts/rollbacks
  • Enriches JSON stdout with a decisions array containing per-skill action, reason, deployment status, validation scores, watch alerts, and rollback status
  • Adds mode banners (DRY RUN / REVIEW / AUTONOMOUS) with rerun hints
  • 13 new tests covering all report phases, mode banners, empty states, rollback tags, and rerun hints (29 total, all passing)

Supersedes #45 — absorbs its formatOrchestrateReport concept but fixes two issues: PR #45 introduced a summary.autoApprove field that doesn't exist (the real field is approvalMode), and referenced the deprecated --auto-approve flag in mode banners.

What users can now see

  • Why each skill was considered (status, pass rate, missed queries)
  • Why it was skipped (healthy, filtered, capped, no agent CLI, no SKILL.md)
  • Whether evolution passed validation (with before/after pass rates)
  • Whether it was deployed
  • Whether watch detected regression, and whether rollback occurred

Test plan

  • All 29 orchestrate tests pass (1331 total across repo)
  • formatOrchestrateReport tested for all 3 modes, all 5 phases, empty states, rollback tags, rerun hints
  • Manual: selftune orchestrate --dry-run produces clear phased report on stderr
  • Manual: JSON stdout includes decisions array with per-skill reasoning

🤖 Generated with Claude Code

WellDunDun and others added 30 commits March 8, 2026 15:17
Deletes Conductor worktree branches (custom/prefix/router-*),
selftune evolve test branches, and orphaned worktree-agent-* branches.
Also prunes stale remote tracking refs. Run with `make clean-branches`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…w candidates

Extends the composability analysis with positive interaction detection
(synergy scores), ordered skill sequence extraction from usage timestamps,
and automatic workflow candidate flagging. Backwards compatible — v1
function and tests unchanged, CLI falls back to v1 when no usage log exists.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements multi-skill workflow support: discovers workflow patterns from
existing telemetry, displays them via `selftune workflows`, and codifies
them to SKILL.md via `selftune workflows save`. Includes 48 tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…9613

Add workflow discovery and CLI command (v0.3)
BUG-1: Remove false-positive git hook checks, fix hook key names to PascalCase
BUG-2: Auto-derive expectations from SKILL.md when none provided
BUG-3: Add --help output to grade command documenting --session-id
BUG-4: Prefer skills_invoked over skills_triggered in session matching
BUG-5: Add pre-flight validation and human-readable errors to evolve
BUG-6: Distinguish real Skill tool calls from SKILL.md browsing reads
IMP-1: Confirmed templates/ in package.json files array
IMP-2: Auto-install agent files during init
IMP-3: Show UNGRADED instead of CRITICAL when no graded sessions exist
IMP-4: Use portable npx selftune hook <name> instead of absolute paths
IMP-5: Add selftune auto-grade command
IMP-6: Mandate AskUserQuestion in evolve workflows
IMP-7: Add selftune quickstart command

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Hook files guard execution behind import.meta.main, so dynamically
importing them was a no-op. Spawn as subprocess instead so stdin
payloads are processed and hooks write telemetry logs correctly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Walkthrough

This PR adds new CLI commands (auto-grade, quickstart, hook), implements expectation auto-derivation from SKILL.md, distinguishes actual skill invocations from passive SKILL.md reads, adds an UNGRADED skill status, migrates hook checks to Claude Code settings.json, and enhances evolve CLI pre-flight validation, verbose logging, and post-run/error messaging.

## Changes

|Cohort / File(s)|Summary|
|---|---|
|**CLI Routing & Init** <br> `cli/selftune/index.ts`, `cli/selftune/init.ts`|Adds `auto-grade`, `quickstart`, and `hook` command routing; `installAgentFiles()` to deploy bundled agent files; expands Claude Code hook detection (4 keys) and supports nested hook entries.|
|**Evolution CLI** <br> `cli/selftune/evolution/evolve.ts`|Pre-flight checks for `--skill-path`/SKILL.md and eval-set paths, verbose config dump, enhanced post-run deployment/diagnostic messaging, and improved top-level error/troubleshooting output.|
|**Automated Grading** <br> `cli/selftune/grading/auto-grade.ts` (new), `cli/selftune/grading/grade-session.ts`|New auto-grade CLI; adds `deriveExpectationsFromSkill()` (exported) to extract up to 5 expectations from SKILL.md and uses it when explicit evals are missing; session selection now prefers `skills_invoked`.|
|**Hook / Skill Eval** <br> `cli/selftune/hooks/skill-eval.ts`, `cli/selftune/observability.ts`|Adds `hasSkillToolInvocation()` and updates `processToolUse` to set `triggered` based on detection; observability now checks Claude Code `settings.json` (flat and nested formats) instead of repo .git/hooks.|
|**Telemetry & Transcript** <br> `cli/selftune/types.ts`, `cli/selftune/utils/transcript.ts`, `cli/selftune/ingestors/claude-replay.ts`|Adds `skills_invoked` to telemetry/transcript types and populates it from tool_use entries; ingestion prefers `skills_invoked` and sets `triggered` accordingly (fallback to previous behavior).|
|**Status / Badge** <br> `cli/selftune/status.ts`, `cli/selftune/badge/badge-data.ts`|Introduces new SkillStatus value `UNGRADED` and extends BadgeData.status to include `UNGRADED`; status logic and display updated to treat skills with records but no triggered (graded) sessions as UNGRADED.|
|**Quickstart Onboarding** <br> `cli/selftune/quickstart.ts` (new)|Adds three-step quickstart flow (init, replay, status) with `suggestSkillsToEvolve()` ranking ungraded/critical/warning skills.|
|**Eval set behavior** <br> `cli/selftune/eval/hooks-to-evals.ts`|`buildEvalSet` now filters positives/negatives to only include records where `triggered === true`.|
|**Templates & Docs** <br> `templates/*-settings.json`, `skill/Workflows/*.md`|Replaces hardcoded path-based hook invocations with `npx selftune hook <name>`; updates Evolve workflow docs to require AskUserQuestion for user inputs.|
|**Tests & Fixtures** <br> `tests/*` (multiple)|Adds/updates tests for derived expectations, skills_invoked precedence, triggered=true/false cases, observability hook formats, UNGRADED state, and eval positives filtering; updates SKILL.md fixtures.|

## Sequence Diagram(s)

```mermaid
sequenceDiagram
    participant User
    participant CLI as auto-grade<br/>CLI
    participant Telemetry as Telemetry<br/>Log
    participant SkillMD as SKILL.md
    participant PreGates as Pre-Gates
    participant Agent as LLM<br/>Agent
    participant Output as JSON<br/>Output

    User->>CLI: auto-grade --skill X [--session-id Y]
    CLI->>Telemetry: Load telemetry log
    Telemetry-->>CLI: Sessions list
    CLI->>CLI: Resolve session (by ID or latest using skills_invoked)
    CLI->>SkillMD: Derive expectations (if needed)
    SkillMD-->>CLI: Expectations (derived or none)
    CLI->>PreGates: Run pre-gates on expectations
    alt All resolved
        PreGates-->>CLI: All expectations resolved
    else Some unresolved
        PreGates-->>CLI: Unresolved list
        CLI->>Agent: gradeViaAgent(unresolved)
        Agent-->>CLI: Graded expectations
    end
    CLI->>CLI: Compile results, metrics, summary
    CLI->>Output: Write GradingResult JSON
    Output-->>User: Summary + file path
```

```mermaid
sequenceDiagram
    participant User
    participant CLI as quickstart<br/>CLI
    participant Config as Config<br/>Check
    participant Replay as Replay<br/>Ingestor
    participant Telemetry as Telemetry<br/>Log
    participant Status as Status<br/>Engine
    participant Skills as Skills<br/>Scorer

    User->>CLI: quickstart
    CLI->>Config: Check config/marker
    alt Config missing
        CLI->>CLI: runInit()
    end
    CLI->>Replay: Check replay marker
    alt Marker missing
        Replay->>CLI: Ingest transcripts (parseTranscript -> skills_invoked)
        CLI->>Telemetry: Update logs/marker
    end
    CLI->>Status: Compute skill statuses (uses triggered & pass rates)
    Status-->>CLI: Formatted status
    CLI->>Skills: suggestSkillsToEvolve()
    Skills-->>CLI: Top ranked skills
    CLI->>User: Display suggestions
```

```mermaid
sequenceDiagram
    participant Transcript as File
    participant Parser as parseTranscript
    participant Detector as Invocation<br/>Detector
    participant Telemetry as Session<br/>Record
    participant Ingestor as claude-replay

    Parser->>Transcript: Read JSONL lines
    loop each message
        Parser->>Detector: Check tool_use entries
        alt tool_use.toolName == "Skill"
            Detector-->>Parser: Extract skill name
            Parser->>Parser: Add to skills_invoked[]
        else
            Detector-->>Parser: ignore
        end
    end
    Parser->>Telemetry: Emit TranscriptMetrics (skills_invoked)
    Ingestor->>Telemetry: Create SkillUsageRecord (use skills_invoked if present)
```

## Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

## Possibly related PRs

- **#23**: Overlaps evolve CLI changes in `cli/selftune/evolution/evolve.ts` (pre-flight, logging, messaging).
- **#13**: Related telemetry/ingestion changes around `skills_invoked` and session selection behavior.
- **#17**: Touches the same ingestor surface (`cli/selftune/ingestors/claude-replay.ts`) and skill-invocation/trigger semantics.
* Update selftune workflow docs and skill versioning

* Improve selftune skill portability and setup docs

* Clarify workflow doc edge cases

* Fix OpenClaw doctor validation and workflow docs

* Polish composability and setup docs
* Fix BUG-7, BUG-8, BUG-9 from demo findings

BUG-7: Add try/catch + array validation around eval-set file loading in
evolve() so parse errors surface as user-facing messages instead of
silent exit.

BUG-8: Add cold-start bootstrap — when extractFailurePatterns returns
empty but the eval set has positive entries, treat those positives as
missed queries so evolve can work on skills with zero usage history.

BUG-9: Add --out flag to evals CLI parseArgs as alias for --output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix evolve CI regressions

* Isolate blog proof fixture mutations

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Fix dashboard export and layout

* Improve telemetry normalization groundwork

* Add test runner state

* Separate and extract telemetry contract

* Fix telemetry CI lint issues

* Fix remaining CI regressions

* Detect zero-trigger monitoring regressions

* Stabilize dashboard report route tests

* Address telemetry review feedback
* Fix telemetry follow-up edge cases

* Fix rollback payload and Codex prompt attribution

* Tighten Codex rollout prompt tracking
* Prepare 0.2.1 release

* Update README install path

* Use trusted publishing for npm
* feat: consume @selftune/telemetry-contract as workspace package

Replace relative path imports of telemetry-contract with the published
@selftune/telemetry-contract workspace package. Adds workspace config to
package.json and expands tsconfig includes to cover packages/*.

Closes SEL-10

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(telemetry-contract): add versioning, metadata, and golden fixtures

Add version 1.0.0 and package metadata (description, author, license,
repository) to the telemetry-contract package. Create golden fixture file
with one valid example per record kind and a test suite that validates
all fixtures against the contract validator.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Make selftune source-truth driven

* Harden live dashboard loading

* Audit cleanup: test split, docs, lint fixes

- Add make test-fast / test-slow targets (5s vs 80s, 16x faster dev loop)
- Add bun run test:fast / test:slow scripts in package.json
- Reposition README as "Claude Code first", update competitive comparison
- Bump PRD.md version to 0.2.1
- Add CHANGELOG unreleased section (source-truth, telemetry-contract, test split)
- Fix pre-existing lint: types.ts formatting, golden.test.ts import order

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add sync flags and hook dispatch to integration guide

- Document all selftune sync flags (--since, --dry-run, --force, etc.)
- Add selftune hook dispatch command with all 6 hook names
- Verified init, activation rules, and source-truth sections already current

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Harden LLM calls and fix pre-existing test failures

Add exponential backoff retry to callViaAgent for transient subprocess
failures. Cap JSONL health-check validation at 500 lines to prevent
timeouts on large log files. Use exported DEFAULT_WINDOW_SESSIONS
constant in dashboard data collection instead of telemetry.length.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat: add SQLite materialization layer for dashboard queries

Add a local SQLite database (via bun:sqlite) as an indexed materialized
view store so the dashboard/report UX no longer depends on recomputing
everything from raw JSONL logs on every request.

New module at cli/selftune/localdb/ with:
- schema.ts: 10 tables + 19 indexes mirroring canonical telemetry and
  local log shapes
- db.ts: openDb() lifecycle with WAL mode, meta key-value helpers
- materialize.ts: full rebuild and incremental materialization from
  JSONL source-of-truth logs
- queries.ts: getOverviewPayload(), getSkillReportPayload(),
  getSkillsList() query helpers

Raw JSONL logs remain authoritative — the DB is a disposable cache that
can always be rebuilt. No new npm dependencies (bun:sqlite only).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve Biome lint and format errors

Auto-fix import ordering, formatting, and replace non-null assertions
with optional chaining in tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add selftune orchestrate command for autonomous core loop

Introduces `selftune orchestrate` — a single entry point that chains
sync → status → evolve → watch into one coordinated run. Defaults to
dry-run mode with explicit --auto-approve for deployments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review — lint errors, logic bug, and type completeness

- Replace string concatenation with template literals (Biome lint)
- Add guard in evolve loop for agent-missing skip mutations
- Replace non-null assertion with `as string` cast
- Remove unused EvolutionAuditEntry import
- Complete DoctorResult mock with required fields

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* style: apply Biome formatting and import sorting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Bumps [oven-sh/setup-bun](https://github.com/oven-sh/setup-bun) from 2.1.2 to 2.1.3.
- [Release notes](https://github.com/oven-sh/setup-bun/releases)
- [Commits](oven-sh/setup-bun@3d26778...ecf28dd)

---
updated-dependencies:
- dependency-name: oven-sh/setup-bun
  dependency-version: 2.1.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/setup-node](https://github.com/actions/setup-node) from 6.2.0 to 6.3.0.
- [Release notes](https://github.com/actions/setup-node/releases)
- [Commits](actions/setup-node@6044e13...53b8394)

---
updated-dependencies:
- dependency-name: actions/setup-node
  dependency-version: 6.3.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 4.32.4 to 4.32.6.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@89a39a4...0d579ff)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 4.32.6
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Improve sync progress and tighten query filtering

* Fix biome formatting errors in sync.ts and query-filter.test.ts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add generic scheduling command and reposition OpenClaw cron as optional

The primary automation story is now agent-agnostic. `selftune schedule`
generates ready-to-use snippets for system cron, macOS launchd, and Linux
systemd timers. `selftune cron` is repositioned as an optional OpenClaw
integration rather than the main automation path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review — centralize schedule data, fix generators and formatting

Derive SCHEDULE_ENTRIES from DEFAULT_CRON_JOBS (single source of truth),
generate launchd/systemd configs for all 4 entries instead of sync-only,
fix biome formatting, and add markdown language tag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use StartCalendarInterval for fixed-time launchd and shell wrappers for chained commands

- launchd: use StartCalendarInterval (Hour/Minute/Weekday) for fixed-time
  schedules instead of approximating with StartInterval
- launchd/systemd: use /bin/sh -c wrapper for commands with && chains
  so prerequisite steps (like sync) are not silently dropped

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
WellDunDun and others added 3 commits March 14, 2026 07:36
* feat: add local dashboard SPA with React + Vite

Introduces a minimal React SPA at apps/local-dashboard/ with two routes:
overview (KPIs, skill health grid, evolution feed) and per-skill drilldown
(pass rate, invocation breakdown, evaluation records). Consumes existing
dashboard-server API endpoints with SSE live updates, explicit loading/
error/empty states, and design tokens matching the current dashboard.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review feedback for local dashboard

Extract shared utils (deriveStatus, formatRate, timeAgo), add SSE exponential
backoff with max retries, filter ungraded skills from avg pass rate, fix stuck
loading state for undefined skillName, use word-boundary regex for evolution
filtering, add focus-visible styles, add typecheck script, and add Vite env
types.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address second round CodeRabbit review feedback

Cancel pending SSE reconnect timers on cleanup, add stale-request guard
to useSkillReport, remove redundant decodeURIComponent (React Router
already decodes), quote font names in CSS for stylelint, and format
deriveStatus signature for Biome.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: align local dashboard SPA with SQLite v2 data architecture

Migrate SPA from old JSONL-reading /api/data endpoints to new
SQLite-backed /api/v2/* endpoints. Add v2 server routes for overview
and per-skill reports. Replace SSE with 15s polling. Rewrite types
to match materialized query shapes from queries.ts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review feedback on local dashboard SPA

- Add language identifier to HANDOFF.md fenced code block (MD040)
- Prevent overlapping polls in useOverview with in-flight guard and sequential setTimeout
- Broaden empty-state check in useSkillReport to include evolution/proposals
- Fix Sessions KPI to use counts.sessions instead of counts.telemetry
- Wrap materializeIncremental in try/catch to preserve last good snapshot on failure

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: sort imports to satisfy Biome organizeImports lint

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit nitpicks — cross-platform dev script, stricter types, CSS compat

- Use concurrently for cross-platform dev script instead of shell backgrounding
- Tighten Sidebar counts prop to Partial<Record<SkillHealthStatus, number>>
- Replace color-mix() with rgba fallback for broader browser support

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add UNKNOWN status filter and extract header height CSS variable

- Add UNKNOWN to STATUS_OPTIONS so all SkillHealthStatus values are filterable
- Extract hardcoded 56px header height to --header-h CSS variable

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: hoist sidebar collapse state to layout and add UNKNOWN filter style

- Lift collapsed state from Sidebar to Overview so grid columns resize properly
- Add .sidebar-collapsed grid rules at all breakpoints
- Fix mobile: collapsed sidebar no longer creates dead-end (shows inline)
- Add .filter-pill.active.filter-unknown CSS rule for UNKNOWN status

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: serve SPA as default dashboard, legacy at /legacy/

- Dashboard server now serves built SPA from apps/local-dashboard/dist/ at /
- Legacy dashboard moved to /legacy/ route
- SPA fallback for client-side routes (e.g. /skills/:name)
- Static asset serving with content-hashed caching for /assets/*
- Path traversal protection on static file serving
- Add build:dashboard script to root package.json
- Include apps/local-dashboard/dist/ in published files
- Falls back to legacy dashboard if SPA build not found

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add shadcn theming with dark/light toggle and selftune branding

Migrate dashboard to shadcn theme system with proper light/dark support.
Dark mode uses selftune site colors (navy/cream/copper), light mode uses
standard shadcn defaults. Add ThemeProvider with localStorage persistence,
sun/moon toggle in site header, and SVG logo with currentColor for both themes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: path traversal check and 404 for missing skills

Use path.relative() + isAbsolute() instead of startsWith() for the SPA
static asset path check to prevent directory traversal bypass. Return 404
from /api/v2/skills/:name when the skill has no usage data.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: biome formatting — semicolons, import order, line length

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review — dedupe polling, fix stale closures, harden theme/config

- Lift useOverview to DashboardShell, pass as prop to Overview (no double polling)
- Fix stale closure in drag handler by deriving indices from prev state
- Validate localStorage theme values, use undefined context default
- Add relative positioning to theme toggle button for MoonIcon overlay
- Fix falsy check hiding zero values in chart tooltip
- Fix invalid Tailwind selectors in dropdown-menu and toggle-group
- Use ESM-safe fileURLToPath instead of __dirname in vite.config
- Switch manualChunks to function form for Base UI subpath matching
- Align pass-rate threshold with deriveStatus in SkillReport
- Use local theme provider in sonner instead of next-themes
- Add missing React import in skeleton, remove unused Separator import
- Include vite.config.ts in tsconfig for typecheck coverage
- Fix inconsistent JSX formatting in select scroll buttons

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review round 2 — shared sorting, DnD fixes, Tailwind v4 migration

- Extract sortByPassRateAndChecks to utils.ts, dedupe sorting in App + Overview
- Derive DnD dataIds from row model (not raw data), guard against -1 indexOf
- Hide pagination when table is empty instead of showing "Page 1 of 0"
- Fix ActivityTimeline default tab to prefer non-empty dataset
- Import ReactNode directly instead of undeclared React namespace
- Quote CSS attribute selector in chart style injection
- Use stable composite keys for tooltip and legend items
- Remove unnecessary "use client" directive from dropdown-menu (Vite SPA)
- Migrate outline-none to outline-hidden for Tailwind v4 accessibility
- Fix toggle-group orientation selectors to match data-orientation attribute
- Add missing CSSProperties import in sonner.tsx
- Add dark mode variant for SkillReport row highlight
- Format vite.config.ts with Biome

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add evidence viewer, evolution timeline, and enhanced skill report

Add EvidenceViewer, EvolutionTimeline, and InfoTip components. Enhance
SkillReport with richer data display, expand dashboard server API
endpoints, and update documentation and architecture docs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review round 3 — DnD/sort conflict, theme listener, formatting

- Disable DnD reorder when table sorting is active (skill-health-grid)
- Listen for OS theme preference changes when system theme is active
- Apply Biome formatting to sortByPassRateAndChecks
- Remove unused useEffect import from Overview
- Deduplicate confidence filter in SkillReport
- Materialize session IDs once in dashboard-server to avoid repeated subqueries

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: show selftune version in sidebar footer

Pass version from API response through to AppSidebar and display
it dynamically instead of hardcoded "dashboard v0.1".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: biome formatting in dashboard-server — line length wrapping

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review round 4 — dedupe formatRate, STATUS_CONFIG, cleanup

- Remove duplicate formatRate from app-sidebar, import from @/utils
- Extract STATUS_CONFIG to shared @/constants module, import in both
  skill-health-grid and SkillReport
- Remove misleading '' fallback from sessionPlaceholders since the
  ternary guards already skip queries when empty

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove redundant items prop from Select to avoid duplication

The SelectItem children already define the options; the items prop
was duplicating them unnecessarily.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add sortableKeyboardCoordinates to KeyboardSensor for proper keyboard DnD

Without this, keyboard navigation moves by pixels instead of jumping
between sortable items.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: Linear-style dashboard UX — collapsible sidebar, direct skill links, scope grouping

- Simplify sidebar: remove status filters, keep logo + search + skills list
- Add collapsible scope groups (Project/Global) using base-ui Collapsible
- Surface skill_scope from DB query through API to dashboard types
- Replace skill drawer with direct Link navigation to skill report
- Add Scope column to skills table with filter dropdown
- Slim down site header: remove breadcrumbs, reduce to sidebar trigger + theme toggle
- Add side-by-side grid layout: skills table left, activity panel right
- Gitignore pnpm-lock.yaml alongside bun.lock

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review — accessibility, semantics, state reset

- Remove bun.lock from .gitignore to maintain build reproducibility
- Preserve unexpected scope values in sidebar (don't drop unrecognized scopes)
- Add aria-label to skill search input for screen reader accessibility
- Switch status filter from checkbox to radio-group semantics (mutually exclusive)
- Reset selectedProposal when navigating between skills via useEffect on name

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add TanStack Query and optimize SQL queries for dashboard performance

Migrate data fetching from manual polling/dedup hooks to TanStack Query
for instant cached navigation, background refetch, and request dedup.
Optimize SQL: replace NOT IN subqueries with LEFT JOIN, move JS dedup
to GROUP BY, add LIMIT 200 to unbounded evidence queries.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: track root bun.lock for reproducible installs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review — collapsible sync, drag handle dedup, a11y, not-found heuristic

- Make sidebar Collapsible controlled so it auto-opens when active skill
  changes (Comment #1)
- Consolidate useSortable to single call per row via React context,
  use setActivatorNodeRef on drag handle button (Comment #2)
- Remove capitalize CSS transform on free-form scope values (Comment #3)
- Broaden isNotFound heuristic to check invocations, prompts, sessions
  in addition to evals/evolution/proposals (Comment #4)
- Move Tooltip outside TabsTrigger to avoid nested interactive elements,
  use Base UI render prop for composition (Comment #5)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit nitpicks — version pinning, changelog clarity, shared query helper

- Use caret range for recharts version (^2.15.4) for consistency
- Clarify changelog: SSE was removed, polling via refetchInterval is primary
- Extract getPendingProposals() shared helper in queries.ts, used by both
  getOverviewPayload() and dashboard-server skill report endpoint

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit round 3 — deps, async fs, type safety, deterministic query

- Move @tailwindcss/vite, tailwindcss, shadcn to devDependencies
- Fix trailing space in version display when version is empty
- Type caught error as unknown in refreshV2Data
- Replace sync fs (readFileSync/statSync) with Bun.file() for hot-path asset serving
- Return 404 for missing /assets/* files instead of falling through to SPA
- Add details and eval_set fields to SkillReportPayload.evidence type
- Fix nondeterministic GROUP BY with ROW_NUMBER() CTE in getPendingProposals

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: resolve Biome lint and format errors in CI

- Replace non-null assertion with type cast in useSkillReport (noNonNullAssertion)
- Break long import line in dashboard-server.ts to satisfy Biome formatter

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit round 4 — CTE subqueries, type alignment, scope index

- Replace dynamic bind-parameter expansion with CTE subquery for session lookups
- Add skill_name to OverviewPayload.pending_proposals type to match runtime shape
- Add composite index on skill_usage(skill_name, skill_scope, timestamp) for scope lookups

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit round 5 — startup guard, 404 heuristic, deterministic tiebreaker

- Guard initial v2 materialization with try/catch to avoid full server crash
- Include evidence in not-found check so evidence-only skills aren't 404'd
- Add ea.id DESC tiebreaker to ROW_NUMBER() for deterministic pending proposals

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit round 6 — db guard, refresh throttle, deferred 404

- Guard openDb() in try/catch so DB bootstrap failure doesn't crash server
- Make db nullable, return 503 from /api/v2/* when store is unavailable
- Throttle failed refresh attempts with separate lastV2RefreshAttemptAt timestamp
- Move skill 404 check after enrichment queries (evolution, proposals, invocations)
- Use optional chaining for db.close() on shutdown

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Promote product planning docs

* Add execution plans for product gaps and evals

* Prepare SPA dashboard release path

* Remove legacy dashboard runtime

* Refresh execution plans after dashboard cutover

* Build dashboard SPA in CI and publish

* Refresh README for SPA release path

* Address dashboard release review comments

* Fix biome lint errors in dashboard tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Make autonomous loop the default scheduler path

* Document orchestrate as the autonomous loop

* Document autonomy-first setup path

* Harden autonomous scheduler install paths

* Clarify sync force usage in README

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Orchestrate output now explains each decision clearly so users can trust
the autonomous loop. Adds formatOrchestrateReport() with 5-phase human
report (sync, status, decisions, evolution, watch) and enriched JSON
with per-skill decisions array. Supersedes PR #45.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 14, 2026

Important

Review skipped

Too many files!

This PR contains 223 files, which is 73 over the limit of 150.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d7fd61fa-649a-455b-bd0c-56e790b81884

📥 Commits

Reviewing files that changed from the base of the PR and between 88efd12 and 918eeb2.

⛔ Files ignored due to path filters (6)
  • apps/local-dashboard/bun.lock is excluded by !**/*.lock
  • apps/local-dashboard/package-lock.json is excluded by !**/package-lock.json
  • apps/local-dashboard/public/favicon.png is excluded by !**/*.png
  • apps/local-dashboard/public/logo.png is excluded by !**/*.png
  • apps/local-dashboard/public/logo.svg is excluded by !**/*.svg
  • bun.lock is excluded by !**/*.lock
📒 Files selected for processing (223)
  • .github/workflows/auto-bump-cli-version.yml
  • .github/workflows/ci.yml
  • .github/workflows/codeql.yml
  • .github/workflows/publish.yml
  • .github/workflows/scorecard.yml
  • .gitignore
  • ARCHITECTURE.md
  • CHANGELOG.md
  • Makefile
  • PRD.md
  • README.md
  • ROADMAP.md
  • apps/local-dashboard/.gitignore
  • apps/local-dashboard/HANDOFF.md
  • apps/local-dashboard/components.json
  • apps/local-dashboard/index.html
  • apps/local-dashboard/package.json
  • apps/local-dashboard/src/App.tsx
  • apps/local-dashboard/src/api.ts
  • apps/local-dashboard/src/components/ActivityTimeline.tsx
  • apps/local-dashboard/src/components/EvidenceViewer.tsx
  • apps/local-dashboard/src/components/EvolutionTimeline.tsx
  • apps/local-dashboard/src/components/InfoTip.tsx
  • apps/local-dashboard/src/components/app-sidebar.tsx
  • apps/local-dashboard/src/components/section-cards.tsx
  • apps/local-dashboard/src/components/site-header.tsx
  • apps/local-dashboard/src/components/skill-health-grid.tsx
  • apps/local-dashboard/src/components/theme-provider.tsx
  • apps/local-dashboard/src/components/theme-toggle.tsx
  • apps/local-dashboard/src/components/ui/avatar.tsx
  • apps/local-dashboard/src/components/ui/badge.tsx
  • apps/local-dashboard/src/components/ui/breadcrumb.tsx
  • apps/local-dashboard/src/components/ui/button.tsx
  • apps/local-dashboard/src/components/ui/card.tsx
  • apps/local-dashboard/src/components/ui/chart.tsx
  • apps/local-dashboard/src/components/ui/checkbox.tsx
  • apps/local-dashboard/src/components/ui/collapsible.tsx
  • apps/local-dashboard/src/components/ui/drawer.tsx
  • apps/local-dashboard/src/components/ui/dropdown-menu.tsx
  • apps/local-dashboard/src/components/ui/input.tsx
  • apps/local-dashboard/src/components/ui/label.tsx
  • apps/local-dashboard/src/components/ui/select.tsx
  • apps/local-dashboard/src/components/ui/separator.tsx
  • apps/local-dashboard/src/components/ui/sheet.tsx
  • apps/local-dashboard/src/components/ui/sidebar.tsx
  • apps/local-dashboard/src/components/ui/skeleton.tsx
  • apps/local-dashboard/src/components/ui/sonner.tsx
  • apps/local-dashboard/src/components/ui/table.tsx
  • apps/local-dashboard/src/components/ui/tabs.tsx
  • apps/local-dashboard/src/components/ui/toggle-group.tsx
  • apps/local-dashboard/src/components/ui/toggle.tsx
  • apps/local-dashboard/src/components/ui/tooltip.tsx
  • apps/local-dashboard/src/constants.tsx
  • apps/local-dashboard/src/hooks/use-mobile.ts
  • apps/local-dashboard/src/hooks/useOverview.ts
  • apps/local-dashboard/src/hooks/useSkillReport.ts
  • apps/local-dashboard/src/lib/utils.ts
  • apps/local-dashboard/src/main.tsx
  • apps/local-dashboard/src/pages/Overview.tsx
  • apps/local-dashboard/src/pages/SkillReport.tsx
  • apps/local-dashboard/src/styles.css
  • apps/local-dashboard/src/types.ts
  • apps/local-dashboard/src/utils.ts
  • apps/local-dashboard/src/vite-env.d.ts
  • apps/local-dashboard/tsconfig.json
  • apps/local-dashboard/vite.config.ts
  • biome.json
  • cli/selftune/badge/badge-data.ts
  • cli/selftune/badge/badge.ts
  • cli/selftune/canonical-export.ts
  • cli/selftune/constants.ts
  • cli/selftune/cron/setup.ts
  • cli/selftune/dashboard-contract.ts
  • cli/selftune/dashboard-server.ts
  • cli/selftune/dashboard.ts
  • cli/selftune/eval/baseline.ts
  • cli/selftune/eval/composability-v2.ts
  • cli/selftune/eval/hooks-to-evals.ts
  • cli/selftune/evolution/evidence.ts
  • cli/selftune/evolution/evolve-body.ts
  • cli/selftune/evolution/evolve.ts
  • cli/selftune/evolution/extract-patterns.ts
  • cli/selftune/grading/auto-grade.ts
  • cli/selftune/grading/grade-session.ts
  • cli/selftune/grading/results.ts
  • cli/selftune/hooks/prompt-log.ts
  • cli/selftune/hooks/session-stop.ts
  • cli/selftune/hooks/skill-eval.ts
  • cli/selftune/index.ts
  • cli/selftune/ingestors/claude-replay.ts
  • cli/selftune/ingestors/codex-rollout.ts
  • cli/selftune/ingestors/codex-wrapper.ts
  • cli/selftune/ingestors/openclaw-ingest.ts
  • cli/selftune/ingestors/opencode-ingest.ts
  • cli/selftune/init.ts
  • cli/selftune/last.ts
  • cli/selftune/localdb/db.ts
  • cli/selftune/localdb/materialize.ts
  • cli/selftune/localdb/queries.ts
  • cli/selftune/localdb/schema.ts
  • cli/selftune/monitoring/watch.ts
  • cli/selftune/normalization.ts
  • cli/selftune/observability.ts
  • cli/selftune/orchestrate.ts
  • cli/selftune/quickstart.ts
  • cli/selftune/repair/skill-usage.ts
  • cli/selftune/schedule.ts
  • cli/selftune/status.ts
  • cli/selftune/sync.ts
  • cli/selftune/types.ts
  • cli/selftune/utils/canonical-log.ts
  • cli/selftune/utils/hooks.ts
  • cli/selftune/utils/html.ts
  • cli/selftune/utils/llm-call.ts
  • cli/selftune/utils/math.ts
  • cli/selftune/utils/query-filter.ts
  • cli/selftune/utils/skill-discovery.ts
  • cli/selftune/utils/skill-log.ts
  • cli/selftune/utils/skill-usage-confidence.ts
  • cli/selftune/utils/transcript.ts
  • cli/selftune/workflows/discover.ts
  • cli/selftune/workflows/skill-md-writer.ts
  • cli/selftune/workflows/workflows.ts
  • dashboard/index.html
  • docs/design-docs/composability-v2.md
  • docs/design-docs/sandbox-claude-code.md
  • docs/design-docs/sandbox-test-harness.md
  • docs/design-docs/workflow-support.md
  • docs/escalation-policy.md
  • docs/exec-plans/active/grader-prompt-evals.md
  • docs/exec-plans/active/local-sqlite-materialization.md
  • docs/exec-plans/active/mcp-tool-descriptions.md
  • docs/exec-plans/active/multi-agent-sandbox.md
  • docs/exec-plans/active/product-reset-and-shipping.md
  • docs/exec-plans/active/telemetry-normalization.md
  • docs/exec-plans/completed/dashboard-spa-cutover.md
  • docs/exec-plans/reference/telemetry-field-map.md
  • docs/exec-plans/tech-debt-tracker.md
  • docs/integration-guide.md
  • package.json
  • packages/telemetry-contract/README.md
  • packages/telemetry-contract/fixtures/golden.json
  • packages/telemetry-contract/fixtures/golden.test.ts
  • packages/telemetry-contract/index.ts
  • packages/telemetry-contract/package.json
  • packages/telemetry-contract/src/index.ts
  • packages/telemetry-contract/src/types.ts
  • packages/telemetry-contract/src/validators.ts
  • skill/SKILL.md
  • skill/Workflows/Composability.md
  • skill/Workflows/Cron.md
  • skill/Workflows/Dashboard.md
  • skill/Workflows/Doctor.md
  • skill/Workflows/Evolve.md
  • skill/Workflows/EvolveBody.md
  • skill/Workflows/Initialize.md
  • skill/Workflows/Orchestrate.md
  • skill/Workflows/Schedule.md
  • skill/Workflows/Sync.md
  • skill/Workflows/Watch.md
  • skill/Workflows/Workflows.md
  • skill/assets/activation-rules-default.json
  • skill/assets/multi-skill-settings.json
  • skill/assets/single-skill-settings.json
  • skill/references/logs.md
  • skill/references/setup-patterns.md
  • skill/references/version-history.md
  • skill/settings_snippet.json
  • templates/multi-skill-settings.json
  • templates/single-skill-settings.json
  • test-results/.last-run.json
  • tests/blog-proof/fixtures/seo-audit/SKILL.md
  • tests/blog-proof/fixtures/seo-audit/SKILL.md.bak
  • tests/blog-proof/seo-audit-evolve.test.ts
  • tests/canonical-export.test.ts
  • tests/contribute/bundle.test.ts
  • tests/cron/setup.test.ts
  • tests/dashboard/badge-routes.test.ts
  • tests/dashboard/dashboard-server.test.ts
  • tests/dashboard/dashboard.test.ts
  • tests/eval/composability-v2.test.ts
  • tests/eval/hooks-to-evals.test.ts
  • tests/evolution/evidence.test.ts
  • tests/evolution/evolve-body.test.ts
  • tests/evolution/evolve.test.ts
  • tests/evolution/extract-patterns.test.ts
  • tests/grading/grade-session-flow.test.ts
  • tests/grading/grade-session.test.ts
  • tests/grading/results.test.ts
  • tests/hooks/prompt-log.test.ts
  • tests/hooks/session-stop.test.ts
  • tests/hooks/skill-eval.test.ts
  • tests/ingestors/claude-replay.test.ts
  • tests/ingestors/codex-rollout.test.ts
  • tests/ingestors/codex-wrapper.test.ts
  • tests/ingestors/openclaw-ingest.test.ts
  • tests/ingestors/opencode-ingest.test.ts
  • tests/init/init.test.ts
  • tests/localdb/localdb.test.ts
  • tests/monitoring/integration.test.ts
  • tests/monitoring/watch.test.ts
  • tests/normalization/normalization.test.ts
  • tests/observability.test.ts
  • tests/orchestrate.test.ts
  • tests/repair/skill-usage.test.ts
  • tests/sandbox/docker/run-openclaw-tests.ts
  • tests/sandbox/fixtures/openclaw/cron/jobs.json
  • tests/sandbox/run-sandbox.ts
  • tests/schedule/schedule.test.ts
  • tests/status/status.test.ts
  • tests/sync.test.ts
  • tests/telemetry-contract/validators.test.ts
  • tests/utils/canonical-log.test.ts
  • tests/utils/html.test.ts
  • tests/utils/llm-call.test.ts
  • tests/utils/query-filter.test.ts
  • tests/utils/skill-discovery.test.ts
  • tests/utils/skill-log.test.ts
  • tests/utils/transcript.test.ts
  • tests/workflows/discover.test.ts
  • tests/workflows/skill-md-writer.test.ts
  • tests/workflows/workflows.test.ts
  • tsconfig.json

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch custom/prefix/router-1773508846626
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant