selftune-dev · WellDunDun · Mar 14, 2026 · Mar 12, 2026 · Mar 12, 2026 · Mar 14, 2026
@@ -20,6 +20,16 @@ jobs:
       - run: bunx @biomejs/biome check .
       - run: bun run lint-architecture.ts
 
+  build-dashboard:
+    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+    steps:
+      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
+      - uses: oven-sh/setup-bun@ecf28ddc73e819eb6fa29df6b34ef8921c743461 # v2
+      - run: bun install
-      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
-      - uses: oven-sh/setup-bun@ecf28ddc73e819eb6fa29df6b34ef8921c743461 # v2
-      - run: bun install
+      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
+      - uses: oven-sh/setup-bun@ecf28ddc73e819eb6fa29df6b34ef8921c743461 # v2
+        with:
+          bun-version: latest
+      - run: bun install
+      - uses: actions/cache@v4
+        with:
+          path: ~/.bun/install/cache
+          key: ${{ runner.os }}-bun-${{ hashFiles('**/bun.lockb') }}
+          restore-keys: |
+            ${{ runner.os }}-bun-
-      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
-      - uses: oven-sh/setup-bun@ecf28ddc73e819eb6fa29df6b34ef8921c743461 # v2
-      - run: bun install
+      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
+      - uses: oven-sh/setup-bun@ecf28ddc73e819eb6fa29df6b34ef8921c743461 # v2
+        with:
+          bun-version: latest
+      - run: bun install
+      - uses: actions/cache@v4
+        with:
+          path: ~/.bun/install/cache
+          key: ${{ runner.os }}-bun-${{ hashFiles('**/bun.lockb') }}
+          restore-keys: |
+            ${{ runner.os }}-bun-
+      - run: bun run build:dashboard
+
   test:
     runs-on: ubuntu-latest
     permissions:

@@ -60,6 +60,9 @@ jobs:
       - name: Install dependencies
         run: bun install
 
+      - name: Build dashboard SPA
+        run: bun run build:dashboard
+
       - name: Verify npm version for trusted publishing
         run: npm --version
 

@@ -44,8 +44,8 @@ cli/selftune/
 ├── observability.ts      Health checks (doctor command)
 ├── status.ts             Skill health summary (status command)
 ├── last.ts               Last session insight (last command)
-├── dashboard.ts          HTML dashboard builder (dashboard command)
-├── dashboard-server.ts   Live Bun.serve server with SSE (dashboard --serve)
+├── dashboard.ts          Dashboard command entry point (SPA server launcher)
+├── dashboard-server.ts   Bun.serve SPA + v2 API server
 ├── types.ts              Shared interfaces (incl. SelftuneConfig)
 ├── constants.ts          Log paths, config paths, known tools
 ├── utils/                Shared utilities (jsonl, transcript, logging, llm-call, schema-validator, trigger-check)
@@ -100,9 +100,6 @@ apps/local-dashboard/     React SPA dashboard (Vite + TypeScript + shadcn/ui)
 ├── vite.config.ts        Dev proxy → dashboard-server, build to dist/
 └── package.json          React 19, Tailwind v4, shadcn/ui, recharts
 
-dashboard/                Legacy HTML dashboard (served at /legacy/)
-└── index.html            Original embedded-JSON dashboard (v1 endpoints)
-
 templates/                Settings and config templates
 ├── single-skill-settings.json
 ├── multi-skill-settings.json

@@ -25,7 +25,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/).
   - Onboarding flow: full empty-state guide for first-time users (3-step setup), dismissible welcome banner for returning users (localStorage-persisted)
 - **SQLite v2 API endpoints** — `GET /api/v2/overview` and `GET /api/v2/skills/:name` backed by materialized SQLite queries (`getOverviewPayload()`, `getSkillReportPayload()`, `getSkillsList()`)
 - **SQL query optimizations** — Replaced `NOT IN` subqueries with `LEFT JOIN + IS NULL`, moved JS-side dedup to SQL `GROUP BY`, added `LIMIT 200` to unbounded evidence queries
-- **SPA serving from dashboard server** — Built SPA served at `/`, legacy HTML dashboard moved to `/legacy/`
+- **SPA serving from dashboard server** — Built SPA served at `/` as the supported local dashboard experience
 - **Source-truth-driven pipeline** — Transcripts and rollouts are now the authoritative source; `sync` rebuilds repaired overlays from source data rather than relying solely on hook-time capture
 - **Telemetry contract package** — `@selftune/telemetry-contract` workspace package with canonical schema types, validators, versioning, metadata, and golden fixture tests
 - **Test split** — `make test-fast` / `make test-slow` and `bun run test:fast` / `bun run test:slow` for faster development feedback loop

@@ -15,15 +15,15 @@
 [![Zero Dependencies](https://img.shields.io/badge/dependencies-0-brightgreen)](https://www.npmjs.com/package/selftune?activeTab=dependencies)
 [![Bun](https://img.shields.io/badge/runtime-bun%20%7C%20node-black)](https://bun.sh)
 
-Your agent skills learn how you work. Detect what's broken. Fix it automatically.
+Your agent skills learn how you work. Detect what's broken. Improve low-risk skill behavior automatically.
 
 **[Install](#install)** · **[Use Cases](#built-for-how-you-actually-work)** · **[How It Works](#how-it-works)** · **[Commands](#commands)** · **[Platforms](#platforms)** · **[Docs](docs/integration-guide.md)**
 
 </div>
 
 ---
 
-Your skills don't understand how you talk. You say "make me a slide deck" and nothing happens — no error, no log, no signal. selftune watches your real sessions, learns how you actually speak, and rewrites skill descriptions to match. Automatically.
+Your skills do not understand how you talk. You say "make me a slide deck" and nothing happens: no error, no signal, no clue why the right skill never fired. selftune reads the transcripts and telemetry your agent already saves, learns how you actually speak, and improves skill descriptions to match. It validates changes before deployment, watches for regressions after, and rolls back when needed.
 
 Built for **Claude Code**. Also works with Codex, OpenCode, and OpenClaw. Zero runtime dependencies.
 
@@ -35,9 +35,28 @@ npx skills add selftune-dev/selftune
 
 Then tell your agent: **"initialize selftune"**
 
-Two minutes. No API keys. No external services. No configuration ceremony. Uses your existing agent subscription. Within minutes you'll see which skills are undertriggering.
+Two minutes. No API keys. No external services. No configuration ceremony. Uses your existing agent subscription.
 
-**CLI only** (no skill, just the CLI):
+Quick proof path:
+
+```bash
+npx selftune@latest doctor
+npx selftune@latest sync
+npx selftune@latest status
+npx selftune@latest dashboard
+```
+
+Use `--force` only when you explicitly need to rebuild local state from scratch.
+
+Autonomy quick start:
+
+```bash
+npx selftune@latest init --enable-autonomy
+npx selftune@latest orchestrate --dry-run
+npx selftune@latest schedule --install --dry-run
+```
+
+**CLI only** (no installed skill):
 
 ```bash
 npx selftune@latest doctor
@@ -68,51 +87,51 @@ combinations repeat, which ones help, and where the friction is.
   <img src="./assets/FeedbackLoop.gif" alt="Observe → Detect → Evolve → Watch" width="800">
 </p>
 
-A continuous feedback loop that makes your skills learn and adapt. Automatically.
+A continuous feedback loop that makes your skills learn and adapt from real work.
 
-**Observe** — Hooks capture every user query and which skills fired. On Claude Code, hooks install automatically. Use `selftune replay` to backfill existing transcripts. This is how your skills start learning.
+**Observe** — selftune reads the transcripts and telemetry your agents already save. On Claude Code, hooks can add low-latency hints, but transcripts and logs are the source of truth. Use `selftune sync` to ingest current activity and `selftune replay` to backfill older Claude Code sessions.
 
-**Detect** — selftune finds the gap between how you talk and how your skills are described. You say "make me a slide deck" and your pptx skill stays silent — selftune catches that mismatch.
+**Detect** — selftune finds the gap between how you talk and how your skills are described. It spots missed triggers, underperforming descriptions, noisy environments, and regressions in real usage.
 
-**Evolve** — Rewrites skill descriptions — and full skill bodies — to match how you actually work. Batched validation with per-stage model control (`--cheap-loop` uses haiku for the loop, sonnet for the gate). Teacher-student body evolution with 3-gate validation. Baseline comparison gates on measurable lift. Automatic backup.
+**Evolve** — For low-risk changes, selftune can autonomously rewrite skill descriptions to match how you actually work. Every proposal is validated before deploy. Full skill-body or routing changes stay available for higher-touch workflows.
 
-**Watch** — After deploying changes, selftune monitors skill trigger rates. If anything regresses, it rolls back automatically. Your skills keep improving without you touching them.
+**Watch** — After deploying changes, selftune monitors trigger quality and post-deploy evidence. If something regresses, it can roll back automatically. The goal is autonomous improvement with safeguards, not blind self-editing.
 
-## What's New in v0.2.0
+## What's New in v0.2.x
 
-- **Full skill body evolution** — Beyond descriptions: evolve routing tables and entire skill bodies using teacher-student model with structural, trigger, and quality gates
-- **Synthetic eval generation** — `selftune evals --synthetic` generates eval sets from SKILL.md via LLM, no session logs needed. Solves cold-start: new skills get evals immediately.
-- **Cheap-loop evolution** — `selftune evolve --cheap-loop` uses haiku for proposal generation and validation, sonnet only for the final deployment gate. ~80% cost reduction.
-- **Batch trigger validation** — Validation now batches 10 queries per LLM call instead of one-per-query. ~10x faster evolution loops.
-- **Per-stage model control** — `--validation-model`, `--proposal-model`, and `--gate-model` flags give fine-grained control over which model runs each evolution stage.
-- **Auto-activation system** — Hooks detect when selftune should run and suggest actions
-- **Enforcement guardrails** — Blocks SKILL.md edits on monitored skills unless `selftune watch` has been run
-- **React SPA dashboard** — `selftune dashboard` serves a React SPA with skill health grid, per-skill drilldown, evidence viewer, evolution timeline, dark/light theming, and SQLite-backed v2 API (legacy dashboard at `/legacy/`)
-- **Evolution memory** — Persists context, plans, and decisions across context resets
-- **4 specialized agents** — Diagnosis analyst, pattern analyst, evolution reviewer, integration guide
-- **Sandbox test harness** — Comprehensive automated test coverage, including devcontainer-based LLM testing
-- **Workflow discovery + codification** — `selftune workflows` finds repeated
-  multi-skill sequences from telemetry, and `selftune workflows save
-  <workflow-id|index>` appends them to `## Workflows` in SKILL.md
+- **Source-truth sync** — `selftune sync` now leads the product loop, using transcripts/logs as truth and hooks as hints
+- **SQLite-backed local app** — `selftune dashboard` now serves the React SPA by default with faster overview/report routes on top of materialized local data
+- **Autonomous low-risk evolution** — description evolution is autonomous by default, with explicit review-required mode for stricter policies
+- **Autonomous scheduling** — `selftune init --enable-autonomy` and `selftune schedule --install` make the orchestrated loop the default recurring runtime
+- **Full skill body evolution** — evolve routing tables and entire skill bodies using teacher-student model with structural, trigger, and quality gates
+- **Synthetic eval generation** — `selftune evals --synthetic` generates eval sets from `SKILL.md` for cold-start skills
+- **Cheap-loop evolution** — `selftune evolve --cheap-loop` uses haiku for proposal generation and validation, sonnet only for the final deployment gate
+- **Per-stage model control** — `--validation-model`, `--proposal-model`, and `--gate-model` give fine-grained control over each evolution stage
+- **Sandbox test harness** — automated coverage, including devcontainer-based LLM testing
+- **Workflow discovery + codification** — `selftune workflows` finds repeated multi-skill sequences from telemetry and can append them to `## Workflows` in `SKILL.md`
 
 ## Commands
 
 | Command | What it does |
 |---|---|
+| `selftune doctor` | Health check: logs, config, permissions, dashboard build/runtime expectations |
+| `selftune sync` | Ingest source-truth activity from supported agents and rebuild local state |
 | `selftune status` | See which skills are undertriggering and why |
+| `selftune dashboard` | Open the React SPA dashboard (SQLite-backed) |
+| `selftune orchestrate` | Run the core loop: sync, inspect candidates, evolve, and watch |
+| `selftune schedule --install` | Install platform-native scheduling for the autonomous loop |
 | `selftune evals --skill <name>` | Generate eval sets from real session data (`--synthetic` for cold-start) |
 | `selftune evolve --skill <name>` | Propose, validate, and deploy improved descriptions (`--cheap-loop`, `--with-baseline`) |
 | `selftune evolve-body --skill <name>` | Evolve full skill body or routing table (teacher-student, 3-gate validation) |
+| `selftune watch --skill <name>` | Monitor after deploy. Auto-rollback on regression. |
+| `selftune replay` | Backfill data from existing Claude Code transcripts |
 | `selftune baseline --skill <name>` | Measure skill value vs no-skill baseline |
 | `selftune unit-test --skill <name>` | Run or generate skill-level unit tests |
 | `selftune composability --skill <name>` | Measure synergy and conflicts between co-occurring skills, with workflow-candidate hints |
 | `selftune workflows` | Discover repeated multi-skill workflows and save a discovered workflow into `SKILL.md` |
 | `selftune import-skillsbench` | Import external eval corpus from [SkillsBench](https://github.com/benchflow-ai/skillsbench) |
 | `selftune badge --skill <name>` | Generate skill health badge SVG |
-| `selftune watch --skill <name>` | Monitor after deploy. Auto-rollback on regression. |
-| `selftune dashboard` | Open the React SPA dashboard (SQLite-backed) |
-| `selftune replay` | Backfill data from existing Claude Code transcripts |
-| `selftune doctor` | Health check: logs, hooks, config, permissions |
+| `selftune cron setup` | Optional scheduler helper for OpenClaw-oriented automation |
 
 Full command reference: `selftune --help`
 
@@ -141,13 +160,13 @@ Observability tools trace LLM calls. Skill authoring tools help you write skills
 
 ## Platforms
 
-**Claude Code** (primary) — Hooks install automatically. `selftune replay` backfills existing transcripts. Full feature support.
+**Claude Code** (primary) — Reads saved transcripts and telemetry directly. Hooks install automatically and add low-latency hints. `selftune replay` backfills older Claude Code sessions. Full feature support.
 
 **Codex** — `selftune wrap-codex -- <args>` or `selftune ingest-codex`
 
 **OpenCode** — `selftune ingest-opencode`
 
-**OpenClaw** — `selftune ingest-openclaw` + `selftune cron setup` for autonomous evolution
+**OpenClaw** — `selftune ingest-openclaw`. `selftune cron setup` remains available as an optional OpenClaw-oriented scheduler helper, but the main product loop is still `selftune orchestrate` plus generic scheduling.
 
 Requires [Bun](https://bun.sh) or Node.js 18+. No extra API keys.
 

@@ -16,7 +16,7 @@
   - Per-skill drilldown with evidence viewer, evolution timeline
   - SQLite v2 API endpoints (`/api/v2/overview`, `/api/v2/skills/:name`)
   - Dark/light theme toggle with selftune branding
-  - SPA served at `/`, legacy HTML dashboard at `/legacy/`
+  - SPA served at `/` as the supported local dashboard
 
 ## In Progress
 - Multi-agent sandbox expansion

@@ -28,8 +28,14 @@ JSONL logs → materializeIncremental() → SQLite → getOverviewPayload() / ge
 ## How to run
 
 ```bash
+# From repo root
+bun run dev
+# → if 7888 is free, starts dashboard server on 7888 and SPA dev server on http://localhost:5199
+# → if 7888 is already in use, reuses that dashboard server and starts only the SPA dev server
+
+# Or run manually:
 # Terminal 1: Start the dashboard server
-selftune dashboard --port 7888
+selftune dashboard --port 7888 --no-open
 
 # Terminal 2: Start the SPA dev server (proxies /api to port 7888)
 cd apps/local-dashboard
@@ -41,7 +47,7 @@ bunx vite
 ## What was rebased / changed
 
 - **SPA types**: Rewritten to match `queries.ts` payload shapes (`OverviewResponse`, `SkillReportResponse`, `SkillSummary`, `EvidenceEntry`)
-- **API layer**: Now calls `/api/v2/overview` and `/api/v2/skills/:name` instead of `/api/data` + `/api/evaluations/:name`
+- **API layer**: Calls `/api/v2/overview` and `/api/v2/skills/:name`
 - **SSE removed**: Replaced with 15s polling (SQLite reads are cheap, SSE was complex)
 - **Overview page**: Uses `SkillSummary[]` from `getSkillsList()` for skill cards (pre-aggregated pass rate, check count, sessions)
 - **Skill report page**: Single fetch to v2 endpoint instead of parallel overview + evaluations fetch. Shows evidence entries, evolution audit history per skill
@@ -61,13 +67,12 @@ bunx vite
 
 ## What still depends on old dashboard code
 
-- The old v1 endpoints (`/api/data`, `/api/events`, `/api/evaluations/:name`) still work and are used by the legacy `dashboard/index.html`
-- Badge endpoints (`/badge/:name`) and report HTML endpoints (`/report/:name`) use the old `computeStatus` + JSONL reader path
+- Badge endpoints (`/badge/:name`) and report HTML endpoints (`/report/:name`) still use the status/evidence JSONL path rather than SQLite-backed view models
 - Action endpoints (`/api/actions/*`) are unchanged
 
 ## What remains before this can become default
 
-1. ~~**Serve built SPA from dashboard-server**~~: Done — `/` serves SPA, old dashboard at `/legacy/`
+1. ~~**Serve built SPA from dashboard-server**~~: Done — `/` serves the SPA
 2. ~~**Production build**~~: Done — `bun run build:dashboard` in root package.json
 3. **Regression detection**: The SQLite layer doesn't compute regression detection yet — `deriveStatus()` currently only uses pass rate + check count. Add a `regression_detected` column to skill summaries when the monitoring snapshot computation moves to SQLite.
 4. **Monitoring snapshot migration**: Move `computeMonitoringSnapshot()` logic into the SQLite materializer or a query helper (window sessions, false negative rate, baseline comparison)

@@ -4,7 +4,7 @@
   "version": "0.1.0",
   "type": "module",
   "scripts": {
-    "dev": "concurrently \"cd ../.. && bun run cli/selftune/index.ts dashboard --serve --port 7888\" \"vite\"",
+    "dev": "concurrently \"cd ../.. && bun run cli/selftune/index.ts dashboard --port 7888 --no-open\" \"vite\"",
     "build": "vite build",
     "preview": "vite preview",
     "typecheck": "tsc --noEmit"