From 7572e715433a2ecad1f40a352632628259111d5a Mon Sep 17 00:00:00 2001 From: WellDunDun <45949032+WellDunDun@users.noreply.github.com> Date: Thu, 12 Mar 2026 22:59:42 +0300 Subject: [PATCH 01/14] Promote product planning docs --- .../active/local-sqlite-materialization.md | 173 ++++++++++++++ .../active/product-reset-and-shipping.md | 223 ++++++++++++++++++ 2 files changed, 396 insertions(+) create mode 100644 docs/exec-plans/active/local-sqlite-materialization.md create mode 100644 docs/exec-plans/active/product-reset-and-shipping.md diff --git a/docs/exec-plans/active/local-sqlite-materialization.md b/docs/exec-plans/active/local-sqlite-materialization.md new file mode 100644 index 0000000..7708063 --- /dev/null +++ b/docs/exec-plans/active/local-sqlite-materialization.md @@ -0,0 +1,173 @@ +# Execution Plan: Local SQLite Materialization and App Data Layer + + + +**Status:** Active +**Created:** 2026-03-12 +**Goal:** Use SQLite as a local indexed/materialized view layer on top of selftune’s raw JSONL source-of-truth logs so the local app can be fast, credible, and simple to reason about. + +--- + +## Executive Summary + +selftune’s raw JSONL logs remain the right source of truth for: + +- telemetry capture +- transcript/source replay +- repair overlays +- append-only local durability + +They are not the right structure for serving a good local product experience directly. + +SQLite via `bun:sqlite` is the right local materialization layer because it gives us: + +- fast indexed reads +- a simple single-file local store +- WAL-backed write safety +- zero extra network services +- a much cleaner foundation for overview/report queries + +The architecture is now: + +- **JSONL = truth** +- **SQLite = local indexed/materialized view** +- **SPA = local user experience** + +--- + +## Why SQLite Is Now Justified + +The old dashboard path showed the limits of raw-log-first serving: + +- repeated large file scans and joins +- poor cold-start performance +- heavy live payloads +- fragile drilldown UX + +SQLite solves the UX/product problem without replacing the telemetry model. + +This is not a move to “database-first telemetry.” It is a local query/materialization layer on top of append-only source logs. + +--- + +## What Has Already Landed + +`#42` introduced the first SQLite local materialization layer. + +That means the work now is not “decide whether to use SQLite.” +The work now is: + +1. stabilize the local DB schema and materialization flow +2. make overview/report queries first-class +3. move the local app to those queries +4. retire the old heavy dashboard path as the primary UX + +--- + +## Data Model Role + +SQLite should hold the structured local data needed for: + +- overview page +- per-skill report page +- evolution evidence and version history +- summary/report payloads consumed by the local app + +Likely source domains: + +- sessions +- prompts +- skill invocations +- execution facts +- evidence +- optional materialized aggregates for overview/report + +The exact schema can evolve, but its role should stay narrow: + +- indexed cache/materialized view +- local query surface +- not the authority for telemetry capture + +--- + +## Architectural Rules + +### 1. JSONL remains authoritative + +If a conflict exists between raw logs and SQLite materialization, the raw logs win. + +### 2. Materialization must be rebuildable + +It should always be possible to rebuild the local DB from source-truth logs. + +### 3. Local app queries should be explicit + +Do not let the app depend on giant generic payloads. Prefer query helpers and routes that match the UX: + +- `OverviewPayload` +- `SkillReportPayload` + +### 4. SQLite should stay local-only for now + +Do not make the local DB the cloud contract. Cloud stays based on canonical telemetry + DB projections. + +--- + +## Immediate Work + +### 1. Stabilize overview/report query helpers + +The local data layer should explicitly support: + +- overview KPI/status/skill-card payload +- single-skill report payload + +### 2. Move the SPA onto SQLite-backed data + +The React local app should stop depending primarily on the old dashboard server’s heavy data path. + +### 3. Keep the old dashboard path only as compatibility + +Do not optimize it indefinitely. Keep it as fallback until the new path is trustworthy. + +### 4. Keep source-truth sync first + +Any materialization flow must still start from fresh source-truth sync/repair data. + +--- + +## Open Questions + +### How incremental should local materialization be? + +Short term: + +- correctness and simplicity matter more than perfect incrementalism + +Later: + +- add incremental rebuilds/checkpoints where safe and justified + +### How much of the old dashboard server should remain? + +Short term: + +- enough to support the new app and compatibility mode + +Long term: + +- the new local app should be the default experience + +--- + +## What This Enables + +If this path is completed, selftune gains: + +- fast local overview loads +- fast skill drilldowns +- simpler local UX architecture +- cleaner alignment between local and cloud payload semantics +- a better demo path on real machine data + +That is why this work is now core to shipping, not optional polish. diff --git a/docs/exec-plans/active/product-reset-and-shipping.md b/docs/exec-plans/active/product-reset-and-shipping.md new file mode 100644 index 0000000..4bef897 --- /dev/null +++ b/docs/exec-plans/active/product-reset-and-shipping.md @@ -0,0 +1,223 @@ +# Execution Plan: Product Reset and Shipping Priorities + + + +**Status:** Active +**Created:** 2026-03-12 +**Goal:** Align selftune around the actual post-merge architecture and the shortest credible path to a fast, trustworthy, shippable product. + +--- + +## Executive Summary + +selftune is no longer blocked by telemetry architecture. It is now blocked by **product shape and UX**. + +Recent merged work changed the baseline: + +- `#38` hardened source-truth telemetry and repair paths +- `#40` added the first orchestrator core loop +- `#41` made generic scheduling the primary posture and OpenClaw cron optional +- `#42` added a local SQLite materialization layer +- `#43` improved sync progress and tightened noisy query filtering + +That means the next phase should optimize for: + +1. **Trustworthy source-truth sync** +2. **A fast, demoable local app on top of materialized local data** +3. **A clear orchestrated loop that evolves, validates, and watches skills** + +The architecture does not need a rewrite. It needs a narrower product story and a better local user experience. + +--- + +## What Changed Since The Earlier Audit + +The earlier architecture audit was directionally right about pruning, orchestration, and avoiding over-scoping. It is now outdated in two areas: + +### 1. SQLite is now justified + +Earlier guidance argued against SQLite. That was reasonable when the local UX still looked like a lightweight HTML dashboard. + +It is no longer reasonable after real-machine proof showed: + +- slow cold dashboard loads +- heavy client-side data flow +- poor drilldown UX on realistic datasets + +The right model is now: + +- JSONL stays source of truth +- SQLite becomes the indexed local view store +- the local app should consume SQLite/materialized queries + +### 2. Cloud/export work is now part of the product path + +Canonical export and cloud ingest are no longer speculative. We already proved: + +- local canonical export works on real source-truth data +- a real `PushPayloadV2` can be generated +- cloud ingest accepts that payload end to end + +So cloud/local alignment now belongs in the main product path. + +--- + +## Current First Principles + +selftune still does one thing: + +**make agent skills improve from real usage data** + +The core loop remains: + +1. **Observe** — ingest source-truth logs/transcripts +2. **Detect** — identify missed triggers, failures, regressions +3. **Fix** — propose and validate improvements +4. **Ship** — deploy and monitor safely + +The most important architectural clarification is: + +- **hooks are hints** +- **transcripts/logs are truth** + +That should govern future product work. + +--- + +## Updated Priority Stack + +## Priority 1: Trustworthy Local Data Model + +Keep making source-truth sync the authority. + +Includes: + +- transcript/rollout replay correctness +- repaired usage overlays +- provenance and scope classification +- polluted query cleanup +- sync transparency and safe incrementalism + +## Priority 2: Demoable Local Product + +Make the local app fast and believable. + +Includes: + +- SQLite materialization +- SPA overview and skill report UX +- clear loading/empty/error states +- making the new local app the default path + +## Priority 3: Orchestrated Skill Improvement + +Make the closed loop obvious and usable. + +Includes: + +- orchestrator refinement +- generic scheduling +- evolve/watch safety and explainability + +## Priority 4: Release And Ship + +Includes: + +- published package proof +- install and upgrade path +- quickstart/demo path +- stable docs/help + +## Priority 5: Paperclip And Multi-Repo Iteration + +Paperclip should accelerate iteration, not become the product priority. + +--- + +## Current Recommendations + +### 1. Make the SPA the real default dashboard path + +Once the SQLite-backed local app is credible, stop treating it as sidecar UI. + +### 2. Stabilize payload contracts for local/cloud dashboards + +Define and align: + +- `OverviewPayload` +- `SkillReportPayload` + +Local should produce them from JSONL + SQLite/materialized queries. +Cloud should produce them from canonical ingest + DB projections. + +### 3. Keep reducing remaining unknown provenance + +Unknown provenance is much lower than before, but not zero. Continue tightening: + +- Claude repair path recovery +- scope/project/global/admin detection + +### 4. Make orchestrator output explainable + +If the system evolves or refuses to evolve a skill, the user should see why immediately. + +### 5. Reduce the shipping surface in docs/help + +Not by deleting code, but by making the main story smaller and easier to follow: + +- `sync` +- `status` +- local app +- `evolve` +- `watch` +- orchestrator +- `doctor` + +--- + +## Things We Should Not Do Right Now + +1. **Do not return to hooks as the primary truth source** +2. **Do not spend another cycle optimizing the old static dashboard path** +3. **Do not make OpenClaw-specific automation the main story again** +4. **Do not do broad CLI regrouping before the local app and orchestrator feel good** +5. **Do not overinvest in Paperclip/platform setup at the expense of product proof** + +--- + +## Updated 1.0 Path + +### Phase 1 + +- source-truth sync remains correct and explainable +- query/provenance cleanup lands +- local SQLite/materialization path is stable + +### Phase 2 + +- SPA overview and skill report become the default local UX +- the local app is fast on real-machine datasets + +### Phase 3 + +- orchestrator becomes the main autonomous loop entry point +- generic scheduling path is documented and stable + +### Phase 4 + +- package release / install proof +- cloud/local payload alignment +- GTM/demo narrative based on the actual product loop + +--- + +## Final Assessment + +The key shift is simple: + +- telemetry correctness is good enough to build on +- the local app is now the highest-leverage product bottleneck +- orchestration is the next core integration layer +- shipping selftune means making the product feel fast, obvious, and trustworthy on a real machine + +That is the current architecture priority. From ba89071c32633f43e3a99aea2edc06aca6b58ef3 Mon Sep 17 00:00:00 2001 From: WellDunDun <45949032+WellDunDun@users.noreply.github.com> Date: Sat, 14 Mar 2026 13:17:52 +0300 Subject: [PATCH 02/14] Add execution plans for product gaps and evals --- docs/exec-plans/active/grader-prompt-evals.md | 110 +++++++++++++++++ .../active/mcp-tool-descriptions.md | 112 ++++++++++++++++++ .../active/product-reset-and-shipping.md | 36 ++++++ 3 files changed, 258 insertions(+) create mode 100644 docs/exec-plans/active/grader-prompt-evals.md create mode 100644 docs/exec-plans/active/mcp-tool-descriptions.md diff --git a/docs/exec-plans/active/grader-prompt-evals.md b/docs/exec-plans/active/grader-prompt-evals.md new file mode 100644 index 0000000..9abfdb6 --- /dev/null +++ b/docs/exec-plans/active/grader-prompt-evals.md @@ -0,0 +1,110 @@ +# Execution Plan: Grader Prompt and Agent Evals + + + +**Status:** Active +**Created:** 2026-03-14 +**Goal:** Evaluate and improve the grader prompts and grading agents so selftune’s session/skill judgments are trustworthy, stable, and measurable. + +--- + +## Problem Statement + +selftune relies on grading to decide: + +- whether a session succeeded +- whether a skill was valuable +- whether evolution helped +- whether monitoring signals are believable + +That makes grader quality a core product dependency. + +Current risks: + +- grader prompts may be too brittle or too noisy +- agent/runtime choice may affect grading consistency +- we do not yet have a tight eval loop for the graders themselves +- users can lose trust quickly if the grader feels arbitrary + +--- + +## Goals + +1. Build a real eval loop for selftune’s grading prompts/agents. +2. Measure grader consistency and failure modes explicitly. +3. Improve prompt quality where graders are too noisy, too weak, or too inconsistent. +4. Separate “grading infrastructure exists” from “grading is trustworthy.” + +--- + +## Scope + +In scope: + +- session grading prompts +- skill-level grading prompts/agents +- eval sets and fixtures for grader behavior +- comparison of grader outputs across representative examples + +Out of scope: + +- broad telemetry architecture changes +- cloud analytics work +- unrelated UI work + +--- + +## Recommended Work + +### 1. Define grader eval corpora + +Build or curate examples for: + +- clear passes +- clear failures +- ambiguous sessions +- noisy wrapper/system-polluted sessions +- skills that should obviously count vs should not count + +### 2. Measure prompt behavior + +Evaluate: + +- consistency +- false positives +- false negatives +- susceptibility to polluted context + +### 3. Compare prompt/agent variants + +Where useful, compare: + +- revised prompt variants +- different calling styles +- stricter vs broader grading criteria + +### 4. Feed results back into product trust + +Use the findings to improve: + +- grading prompts +- grading docs +- orchestrator confidence +- monitoring credibility + +--- + +## Deliverables + +1. A grader-focused eval suite +2. Prompt revisions where justified +3. A short report on grader failure modes +4. Recommendations for how much trust product features should place in current grading + +--- + +## Success Criteria + +- Grader behavior becomes more measurable and explainable +- Prompt changes are backed by eval evidence, not intuition +- selftune’s “it works” claim becomes more credible because the grading layer is being tested directly diff --git a/docs/exec-plans/active/mcp-tool-descriptions.md b/docs/exec-plans/active/mcp-tool-descriptions.md new file mode 100644 index 0000000..242dab5 --- /dev/null +++ b/docs/exec-plans/active/mcp-tool-descriptions.md @@ -0,0 +1,112 @@ +# Execution Plan: MCP Tool Descriptions and Surface Quality + + + +**Status:** Active +**Created:** 2026-03-14 +**Goal:** Improve selftune’s MCP/tool descriptions so agent runtimes can understand and select the right tools more reliably, with less ambiguity and less prompt burden. + +--- + +## Problem Statement + +selftune increasingly depends on agents selecting the right commands and flows without human hand-holding. That makes tool surface quality part of the product. + +Current risk areas: + +- command descriptions are uneven across workflows +- some commands are over-broad or under-specified +- agent runtimes need clearer “when to use this” guidance +- local app/orchestrator/scheduler capabilities have changed faster than the descriptive layer around them + +This is especially important for: + +- MCP-style tool exposure +- Paperclip / Claude Code / other autonomous agent runtimes +- future cloud/local parity in product semantics + +--- + +## Goals + +1. Define clean, unambiguous descriptions for the most important selftune tools and commands. +2. Reduce ambiguity in when an agent should use: + - `sync` + - `status` + - `doctor` + - `evolve` + - `watch` + - orchestrator + - local app/dashboard flows +3. Make the tool surface reflect the current source-truth-first architecture. +4. Improve the ability of external runtimes to use selftune without long custom prompts. + +--- + +## Scope + +In scope: + +- CLI command descriptions and help text +- MCP/tool descriptions for externally exposed workflows +- workflow routing docs in `skill/Workflows/` +- any thin metadata or schema layer needed to describe the tool surface clearly + +Out of scope: + +- large command regrouping refactors +- product semantics changes +- cloud implementation details + +--- + +## Recommended Work + +### 1. Inventory the current tool surface + +Create a current map of: + +- core user-facing commands +- advanced commands +- commands that should be de-emphasized + +### 2. Standardize description format + +Each command/tool description should answer: + +- what it does +- when to use it +- what preconditions it assumes +- what it outputs +- whether it changes state + +### 3. Align with the current architecture + +Descriptions should clearly reflect: + +- source-truth sync first +- local app as the intended UX path +- OpenClaw cron as optional, not primary +- orchestrator as the autonomous loop entry + +### 4. Define agent-friendly descriptions + +Produce descriptions that are short enough for tool selection, but specific enough to reduce misuse. + +--- + +## Deliverables + +1. A canonical inventory of the selftune tool surface +2. Updated command/workflow descriptions +3. MCP/tool-facing description text for core commands +4. Guidance on which tools should be exposed by default vs advanced + +--- + +## Success Criteria + +- Agents choose the right selftune tools with less prompt scaffolding +- Fewer ambiguous tool-selection failures +- The tool surface matches the current product story +- Help/docs/workflow descriptions stop lagging behind the implementation diff --git a/docs/exec-plans/active/product-reset-and-shipping.md b/docs/exec-plans/active/product-reset-and-shipping.md index 4bef897..00dd125 100644 --- a/docs/exec-plans/active/product-reset-and-shipping.md +++ b/docs/exec-plans/active/product-reset-and-shipping.md @@ -136,6 +136,42 @@ Paperclip should accelerate iteration, not become the product priority. ## Current Recommendations +## Remaining Product Gaps + +These are the highest-confidence gaps still blocking adoption and confident shipping: + +### 1. The local UX is still not good enough + +The old dashboard path remains too slow and awkward, and the SQLite + SPA path is not yet the obvious default experience. + +### 2. The autonomous loop is not yet obvious and trustworthy + +The orchestrator exists, but the product does not yet feel like a safe, comprehensible “turn this on and it improves my skills” system. + +### 3. Evolution is still under-triggering in practice + +We can prove skill usage and at least one real successful evolution, but the system still does not yet feel like it consistently turns real usage into useful proposed improvements across many skills. + +### 4. Query and environment pollution still distort the signal + +Polluted host environments still make status and unmatched-query outputs harder to trust than they should be. + +### 5. Local/cloud product contracts are not fully stabilized + +We proved OSS export -> cloud ingest, but the actual user-facing payload contracts for overview/report views still need to be made explicit and aligned. + +### 6. The default story is still too broad + +The product still presents too much surface area for a first-time user instead of one tight loop. + +### 7. The release path still needs one clean published-package proof + +Branch code has been proven on a real machine; the final “published install behaves the same way” proof still needs to happen. + +--- + +## Current Recommendations + ### 1. Make the SPA the real default dashboard path Once the SQLite-backed local app is credible, stop treating it as sidecar UI. From 94c8f67b0cef53bf12f338bd9403b5da1a2e3b33 Mon Sep 17 00:00:00 2001 From: WellDunDun <45949032+WellDunDun@users.noreply.github.com> Date: Sat, 14 Mar 2026 16:40:57 +0300 Subject: [PATCH 03/14] Prepare SPA dashboard release path --- apps/local-dashboard/HANDOFF.md | 6 + apps/local-dashboard/src/types.ts | 180 ++------------- cli/selftune/dashboard-contract.ts | 161 ++++++++++++++ cli/selftune/localdb/queries.ts | 116 +--------- cli/selftune/orchestrate.ts | 52 +++-- package.json | 2 + skill/Workflows/Dashboard.md | 6 + tests/dashboard/dashboard-server.test.ts | 270 +++++++++++++---------- tests/orchestrate.test.ts | 37 +++- 9 files changed, 425 insertions(+), 405 deletions(-) create mode 100644 cli/selftune/dashboard-contract.ts diff --git a/apps/local-dashboard/HANDOFF.md b/apps/local-dashboard/HANDOFF.md index 251a32f..6312396 100644 --- a/apps/local-dashboard/HANDOFF.md +++ b/apps/local-dashboard/HANDOFF.md @@ -28,6 +28,12 @@ JSONL logs → materializeIncremental() → SQLite → getOverviewPayload() / ge ## How to run ```bash +# From repo root +bun run dev +# → if 7888 is free, starts dashboard server on 7888 and SPA dev server on http://localhost:5199 +# → if 7888 is already in use, reuses that dashboard server and starts only the SPA dev server + +# Or run manually: # Terminal 1: Start the dashboard server selftune dashboard --port 7888 diff --git a/apps/local-dashboard/src/types.ts b/apps/local-dashboard/src/types.ts index ef9aae6..3f6fb9a 100644 --- a/apps/local-dashboard/src/types.ts +++ b/apps/local-dashboard/src/types.ts @@ -1,168 +1,22 @@ /** Data contracts for the v2 SQLite-backed dashboard API */ -// -- Shared primitives -------------------------------------------------------- - -export interface TelemetryRecord { - timestamp: string; - session_id: string; - skills_triggered: string[]; - errors_encountered: number; - total_tool_calls: number; -} - -export interface SkillUsageRecord { - timestamp: string; - session_id: string; - skill_name: string; - skill_path: string; - query: string; - triggered: boolean; - source: string | null; -} - -export interface EvalSnapshot { - before_pass_rate?: number; - after_pass_rate?: number; - net_change?: number; - improved?: boolean; - regressions?: Array>; - new_passes?: Array>; -} - -export interface EvolutionEntry { - timestamp: string; - proposal_id: string; - action: string; - details: string; - eval_snapshot?: EvalSnapshot | null; -} - -export interface UnmatchedQuery { - timestamp: string; - session_id: string; - query: string; -} - -export interface PendingProposal { - proposal_id: string; - action: string; - timestamp: string; - details: string; - skill_name?: string; -} - -// -- /api/v2/overview response ------------------------------------------------ - -export interface SkillSummary { - skill_name: string; - skill_scope: string | null; - total_checks: number; - triggered_count: number; - pass_rate: number; - unique_sessions: number; - last_seen: string | null; - has_evidence: boolean; -} - -export interface OverviewResponse { - overview: { - telemetry: TelemetryRecord[]; - skills: SkillUsageRecord[]; - evolution: EvolutionEntry[]; - counts: { - telemetry: number; - skills: number; - evolution: number; - evidence: number; - sessions: number; - prompts: number; - }; - unmatched_queries: UnmatchedQuery[]; - pending_proposals: PendingProposal[]; - }; - skills: SkillSummary[]; - version?: string; -} - -// -- /api/v2/skills/:name response -------------------------------------------- - -export interface EvidenceEntry { - proposal_id: string; - target: string; - stage: string; - timestamp: string; - rationale: string | null; - confidence: number | null; - original_text: string | null; - proposed_text: string | null; - validation: Record | null; - details: string | null; - eval_set: Array>; -} - -export interface CanonicalInvocation { - timestamp: string; - session_id: string; - skill_name: string; - invocation_mode: string | null; - triggered: boolean; - confidence: number | null; - tool_name: string | null; -} - -export interface PromptSample { - prompt_text: string; - prompt_kind: string | null; - is_actionable: boolean; - occurred_at: string; - session_id: string; -} - -export interface SessionMeta { - session_id: string; - platform: string | null; - model: string | null; - agent_cli: string | null; - branch: string | null; - workspace_path: string | null; - started_at: string | null; - ended_at: string | null; - completion_status: string | null; -} - -export interface SkillReportResponse { - skill_name: string; - usage: { - total_checks: number; - triggered_count: number; - pass_rate: number; - }; - recent_invocations: Array<{ - timestamp: string; - session_id: string; - query: string; - triggered: boolean; - source: string | null; - }>; - evidence: EvidenceEntry[]; - sessions_with_skill: number; - evolution: EvolutionEntry[]; - pending_proposals: PendingProposal[]; - // Extended data - token_usage: { - total_input_tokens: number; - total_output_tokens: number; - }; - canonical_invocations: CanonicalInvocation[]; - duration_stats: { - avg_duration_ms: number; - total_duration_ms: number; - execution_count: number; - total_errors: number; - }; - prompt_samples: PromptSample[]; - session_metadata: SessionMeta[]; -} +export type { + CanonicalInvocation, + EvalSnapshot, + EvidenceEntry, + EvolutionEntry, + OverviewPayload, + OverviewResponse, + PendingProposal, + PromptSample, + SessionMeta, + SkillReportPayload, + SkillReportResponse, + SkillSummary, + SkillUsageRecord, + TelemetryRecord, + UnmatchedQuery, +} from "../../../cli/selftune/dashboard-contract"; // -- UI types ----------------------------------------------------------------- diff --git a/cli/selftune/dashboard-contract.ts b/cli/selftune/dashboard-contract.ts new file mode 100644 index 0000000..6c235b3 --- /dev/null +++ b/cli/selftune/dashboard-contract.ts @@ -0,0 +1,161 @@ +export interface TelemetryRecord { + timestamp: string; + session_id: string; + skills_triggered: string[]; + errors_encountered: number; + total_tool_calls: number; +} + +export interface SkillUsageRecord { + timestamp: string; + session_id: string; + skill_name: string; + skill_path: string; + query: string; + triggered: boolean; + source: string | null; +} + +export interface EvalSnapshot { + before_pass_rate?: number; + after_pass_rate?: number; + net_change?: number; + improved?: boolean; + regressions?: Array>; + new_passes?: Array>; +} + +export interface EvolutionEntry { + timestamp: string; + proposal_id: string; + action: string; + details: string; + eval_snapshot?: EvalSnapshot | null; +} + +export interface UnmatchedQuery { + timestamp: string; + session_id: string; + query: string; +} + +export interface PendingProposal { + proposal_id: string; + action: string; + timestamp: string; + details: string; + skill_name?: string; +} + +export interface SkillSummary { + skill_name: string; + skill_scope: string | null; + total_checks: number; + triggered_count: number; + pass_rate: number; + unique_sessions: number; + last_seen: string | null; + has_evidence: boolean; +} + +export interface OverviewPayload { + telemetry: TelemetryRecord[]; + skills: SkillUsageRecord[]; + evolution: EvolutionEntry[]; + counts: { + telemetry: number; + skills: number; + evolution: number; + evidence: number; + sessions: number; + prompts: number; + }; + unmatched_queries: UnmatchedQuery[]; + pending_proposals: PendingProposal[]; +} + +export interface OverviewResponse { + overview: OverviewPayload; + skills: SkillSummary[]; + version?: string; +} + +export interface EvidenceEntry { + proposal_id: string; + target: string; + stage: string; + timestamp: string; + rationale: string | null; + confidence: number | null; + original_text: string | null; + proposed_text: string | null; + validation: Record | null; + details: string | null; + eval_set: Array>; +} + +export interface CanonicalInvocation { + timestamp: string; + session_id: string; + skill_name: string; + invocation_mode: string | null; + triggered: boolean; + confidence: number | null; + tool_name: string | null; +} + +export interface PromptSample { + prompt_text: string; + prompt_kind: string | null; + is_actionable: boolean; + occurred_at: string; + session_id: string; +} + +export interface SessionMeta { + session_id: string; + platform: string | null; + model: string | null; + agent_cli: string | null; + branch: string | null; + workspace_path: string | null; + started_at: string | null; + ended_at: string | null; + completion_status: string | null; +} + +export interface SkillReportPayload { + skill_name: string; + usage: { + total_checks: number; + triggered_count: number; + pass_rate: number; + }; + recent_invocations: Array<{ + timestamp: string; + session_id: string; + query: string; + triggered: boolean; + source: string | null; + }>; + evidence: EvidenceEntry[]; + sessions_with_skill: number; +} + +export interface SkillReportResponse extends SkillReportPayload { + evolution: EvolutionEntry[]; + pending_proposals: PendingProposal[]; + token_usage: { + total_input_tokens: number; + total_output_tokens: number; + }; + canonical_invocations: CanonicalInvocation[]; + duration_stats: { + avg_duration_ms: number; + total_duration_ms: number; + execution_count: number; + total_errors: number; + }; + prompt_samples: PromptSample[]; + session_metadata: SessionMeta[]; +} diff --git a/cli/selftune/localdb/queries.ts b/cli/selftune/localdb/queries.ts index 51f93ca..82a7b99 100644 --- a/cli/selftune/localdb/queries.ts +++ b/cli/selftune/localdb/queries.ts @@ -6,53 +6,12 @@ */ import type { Database } from "bun:sqlite"; - -// -- Overview payload --------------------------------------------------------- - -export interface OverviewPayload { - telemetry: Array<{ - timestamp: string; - session_id: string; - skills_triggered: string[]; - errors_encountered: number; - total_tool_calls: number; - }>; - skills: Array<{ - timestamp: string; - session_id: string; - skill_name: string; - skill_path: string; - query: string; - triggered: boolean; - source: string | null; - }>; - evolution: Array<{ - timestamp: string; - proposal_id: string; - action: string; - details: string; - }>; - counts: { - telemetry: number; - skills: number; - evolution: number; - evidence: number; - sessions: number; - prompts: number; - }; - unmatched_queries: Array<{ - timestamp: string; - session_id: string; - query: string; - }>; - pending_proposals: Array<{ - proposal_id: string; - action: string; - timestamp: string; - details: string; - skill_name: string; - }>; -} +import type { + OverviewPayload, + PendingProposal, + SkillReportPayload, + SkillSummary, +} from "../dashboard-contract.js"; /** * Build the overview payload from SQLite, suitable for the dashboard main page. @@ -77,7 +36,7 @@ export function getOverviewPayload(db: Database): OverviewPayload { const telemetry = telemetryRows.map((row) => ({ timestamp: row.timestamp, session_id: row.session_id, - skills_triggered: safeParseJsonArray(row.skills_triggered_json), + skills_triggered: safeParseJsonArray(row.skills_triggered_json), errors_encountered: row.errors_encountered, total_tool_calls: row.total_tool_calls, })); @@ -174,38 +133,6 @@ export function getOverviewPayload(db: Database): OverviewPayload { }; } -// -- Skill report payload ----------------------------------------------------- - -export interface SkillReportPayload { - skill_name: string; - usage: { - total_checks: number; - triggered_count: number; - pass_rate: number; - }; - recent_invocations: Array<{ - timestamp: string; - session_id: string; - query: string; - triggered: boolean; - source: string | null; - }>; - evidence: Array<{ - proposal_id: string; - target: string; - stage: string; - timestamp: string; - rationale: string | null; - confidence: number | null; - original_text: string | null; - proposed_text: string | null; - validation: Record | null; - details: string | null; - eval_set: string[]; - }>; - sessions_with_skill: number; -} - /** * Build the skill report payload for a specific skill. */ @@ -285,7 +212,7 @@ export function getSkillReportPayload(db: Database, skillName: string): SkillRep proposed_text: row.proposed_text, validation: safeParseJson(row.validation_json), details: row.details, - eval_set: safeParseJsonArray(row.eval_set_json), + eval_set: safeParseJsonArray>(row.eval_set_json), })); // Unique sessions count @@ -306,19 +233,6 @@ export function getSkillReportPayload(db: Database, skillName: string): SkillRep }; } -// -- Skills list payload ------------------------------------------------------ - -export interface SkillSummary { - skill_name: string; - skill_scope: string | null; - total_checks: number; - triggered_count: number; - pass_rate: number; - unique_sessions: number; - last_seen: string | null; - has_evidence: boolean; -} - /** * Get a summary list of all skills with aggregated stats. */ @@ -368,16 +282,6 @@ export function getSkillsList(db: Database): SkillSummary[] { })); } -// -- Shared query helpers ----------------------------------------------------- - -export interface PendingProposal { - proposal_id: string; - action: string; - timestamp: string; - details: string; - skill_name: string; -} - /** * Get pending proposals (created/validated with no terminal action). * Optionally filtered by skill_name. @@ -407,11 +311,11 @@ export function getPendingProposals(db: Database, skillName?: string): PendingPr // -- Helpers ------------------------------------------------------------------ -function safeParseJsonArray(json: string | null): string[] { +function safeParseJsonArray(json: string | null): T[] { if (!json) return []; try { const parsed = JSON.parse(json); - return Array.isArray(parsed) ? parsed : []; + return Array.isArray(parsed) ? (parsed as T[]) : []; } catch { return []; } diff --git a/cli/selftune/orchestrate.ts b/cli/selftune/orchestrate.ts index ae2f61d..092156d 100644 --- a/cli/selftune/orchestrate.ts +++ b/cli/selftune/orchestrate.ts @@ -5,7 +5,8 @@ * It chains existing modules (sync, status, evolve, watch) into one * coordinated run with explicit candidate selection and safety controls. * - * Default behavior is safe: dry-run mode, no deployments without --auto-approve. + * Default behavior is autonomous for low-risk description evolution, with + * explicit dry-run and review-required modes for human-in-the-loop operation. */ import { homedir } from "node:os"; @@ -38,8 +39,8 @@ import { readEffectiveSkillUsageRecords } from "./utils/skill-log.js"; export interface OrchestrateOptions { /** Run sync → status → evolve → watch without writing changes. */ dryRun: boolean; - /** Allow evolve to deploy changes (without this, evolve always uses dry-run). */ - autoApprove: boolean; + /** Approval policy for low-risk description evolution. */ + approvalMode: "auto" | "review"; /** Scope to a single skill by name. */ skillFilter?: string; /** Cap the number of skills processed per run. */ @@ -70,7 +71,7 @@ export interface OrchestrateResult { watched: number; skipped: number; dryRun: boolean; - autoApprove: boolean; + approvalMode: "auto" | "review"; elapsedMs: number; }; } @@ -302,7 +303,7 @@ export async function orchestrate( continue; } - const effectiveDryRun = options.dryRun || !options.autoApprove; + const effectiveDryRun = options.dryRun || options.approvalMode === "review"; console.error( `[orchestrate] Evolving "${candidate.skill}"${effectiveDryRun ? " (dry-run)" : ""}...`, ); @@ -405,7 +406,7 @@ export async function orchestrate( watched: watchedCount, skipped: candidates.filter((c) => c.action === "skip").length, dryRun: options.dryRun, - autoApprove: options.autoApprove, + approvalMode: options.approvalMode, elapsedMs: Date.now() - startTime, }, }; @@ -420,7 +421,8 @@ export async function orchestrate( export async function cliMain(): Promise { const { values } = parseArgs({ options: { - "dry-run": { type: "boolean", default: true }, + "dry-run": { type: "boolean", default: false }, + "review-required": { type: "boolean", default: false }, "auto-approve": { type: "boolean", default: false }, skill: { type: "string" }, "max-skills": { type: "string", default: "5" }, @@ -440,8 +442,9 @@ Usage: selftune orchestrate [options] Options: - --dry-run Preview actions without mutations (default: true) - --auto-approve Allow evolve to deploy changes + --dry-run Preview actions without mutations + --review-required Validate candidates but require human review before deploy + --auto-approve Deprecated alias; autonomous mode is now the default --skill Scope to a single skill --max-skills Cap skills processed per run (default: 5) --recent-window Hours to look back for watch targets (default: 48) @@ -449,13 +452,15 @@ Options: -h, --help Show this help message Safety: - By default, orchestrate runs in dry-run mode. Evolve proposals are - validated but not deployed. Pass --auto-approve to enable deployment. - Even with --auto-approve, each skill must pass validation gates. + By default, low-risk description evolution runs autonomously after + validation. Use --review-required to keep a human in the loop, or + --dry-run to preview the whole loop without mutations. Every deploy + still passes validation gates first. Examples: - selftune orchestrate # dry-run preview - selftune orchestrate --auto-approve # deploy validated changes + selftune orchestrate # autonomous description evolution + selftune orchestrate --review-required # validate but do not deploy + selftune orchestrate --dry-run # preview only selftune orchestrate --skill Research # single skill selftune orchestrate --max-skills 3 # limit scope`); process.exit(0); @@ -473,13 +478,20 @@ Examples: process.exit(1); } - // --auto-approve implies --no-dry-run const autoApprove = values["auto-approve"] ?? false; - const dryRun = autoApprove ? false : (values["dry-run"] ?? true); + if (autoApprove) { + console.error( + "[orchestrate] --auto-approve is deprecated; autonomous mode is now the default.", + ); + } + + const reviewRequired = values["review-required"] ?? false; + const dryRun = values["dry-run"] ?? false; + const approvalMode: "auto" | "review" = reviewRequired ? "review" : "auto"; const result = await orchestrate({ dryRun, - autoApprove, + approvalMode, skillFilter: values.skill, maxSkills, recentWindowHours: recentWindow, @@ -499,11 +511,13 @@ Examples: console.error(` Watched: ${result.summary.watched}`); console.error(` Skipped: ${result.summary.skipped}`); console.error(` Dry run: ${result.summary.dryRun}`); - console.error(` Auto-approve: ${result.summary.autoApprove}`); + console.error(` Approval mode: ${result.summary.approvalMode}`); console.error(` Elapsed: ${(result.summary.elapsedMs / 1000).toFixed(1)}s`); if (result.summary.dryRun && result.summary.evaluated > 0) { - console.error("\n Pass --auto-approve to deploy validated changes."); + console.error("\n Rerun without --dry-run to allow validated deployments."); + } else if (result.summary.approvalMode === "review" && result.summary.evaluated > 0) { + console.error("\n Rerun without --review-required to allow validated deployments."); } process.exit(0); diff --git a/package.json b/package.json index c085f8a..949820c 100644 --- a/package.json +++ b/package.json @@ -50,6 +50,8 @@ "CHANGELOG.md" ], "scripts": { + "dev": "sh -c 'if lsof -iTCP:7888 -sTCP:LISTEN >/dev/null 2>&1; then echo \"Using existing dashboard server on 7888\"; cd apps/local-dashboard && bun install && bunx vite --strictPort; else cd apps/local-dashboard && bun install && bun run dev; fi'", + "dev:dashboard": "bun run dev", "lint": "bunx @biomejs/biome check .", "lint:fix": "bunx @biomejs/biome check --write .", "lint:arch": "bun run lint-architecture.ts", diff --git a/skill/Workflows/Dashboard.md b/skill/Workflows/Dashboard.md index bb9aa8b..2b86070 100644 --- a/skill/Workflows/Dashboard.md +++ b/skill/Workflows/Dashboard.md @@ -208,6 +208,12 @@ selftune dashboard --port 8080 To develop the React SPA locally: ```bash +# From repo root +bun run dev +# → if 7888 is free, starts both the dashboard server and the SPA dev server +# → if 7888 is already in use, reuses that dashboard server and starts only the SPA dev server on http://localhost:5199 + +# Or run manually: # Terminal 1: Start the dashboard server selftune dashboard --port 7888 diff --git a/tests/dashboard/dashboard-server.test.ts b/tests/dashboard/dashboard-server.test.ts index 2e05998..5f09344 100644 --- a/tests/dashboard/dashboard-server.test.ts +++ b/tests/dashboard/dashboard-server.test.ts @@ -47,63 +47,98 @@ beforeAll(async () => { }); describe("dashboard-server", () => { - let server: { server: unknown; stop: () => void; port: number }; - - beforeAll(async () => { - server = await startDashboardServer({ - port: 0, // random port - host: "localhost", - openBrowser: false, - dataLoader: () => fakeData, - statusLoader: () => ({ - skills: [ - { - name: "test-skill", - passRate: 1, - trend: "stable", - missedQueries: 0, - status: "HEALTHY", - snapshot: null, + let serverPromise: + | Promise<{ server: unknown; stop: () => void; port: number }> + | null = null; + + async function getServer(): Promise<{ server: unknown; stop: () => void; port: number }> { + if (!serverPromise) { + serverPromise = startDashboardServer({ + port: 0, // random port + host: "127.0.0.1", + openBrowser: false, + dataLoader: () => fakeData, + statusLoader: () => ({ + skills: [ + { + name: "test-skill", + passRate: 1, + trend: "stable", + missedQueries: 0, + status: "HEALTHY", + snapshot: null, + }, + ], + unmatchedQueries: 0, + pendingProposals: 0, + lastSession: "2026-03-12T10:00:00Z", + system: { + healthy: true, + pass: 1, + fail: 0, + warn: 0, }, - ], - unmatchedQueries: 0, - pendingProposals: 0, - lastSession: "2026-03-12T10:00:00Z", - system: { - healthy: true, - pass: 1, - fail: 0, - warn: 0, - }, - }), - actionRunner: async (command) => ({ - success: command !== "rollback", - output: `${command} ok`, - error: command === "rollback" ? "rollback blocked in test" : null, - }), - }); - }); - - afterAll(() => { - server?.stop(); + }), + actionRunner: async (command) => ({ + success: command !== "rollback", + output: `${command} ok`, + error: command === "rollback" ? "rollback blocked in test" : null, + }), + }); + } + + return serverPromise; + } + + async function readRootHtml(): Promise { + const server = await getServer(); + const res = await fetch(`http://127.0.0.1:${server.port}/`); + return res.text(); + } + + async function servesSpaShell(): Promise { + const html = await readRootHtml(); + return html.includes("
") && html.includes("/assets/"); + } + + afterAll(async () => { + if (serverPromise) { + const server = await serverPromise; + server.stop(); + } }); // ---- GET / ---- describe("GET /", () => { it("returns 200 with HTML content", async () => { - const res = await fetch(`http://localhost:${server.port}/`); + const server = await getServer(); + const res = await fetch(`http://127.0.0.1:${server.port}/`); expect(res.status).toBe(200); expect(res.headers.get("content-type")).toContain("text/html"); - }); + }, 15000); it("contains the selftune title", async () => { - const res = await fetch(`http://localhost:${server.port}/`); - const html = await res.text(); + const html = await readRootHtml(); expect(html).toContain("selftune"); }); - it("sets the live mode flag", async () => { - const res = await fetch(`http://localhost:${server.port}/`); + it("serves either the SPA shell or the legacy live shell", async () => { + const html = await readRootHtml(); + const isSpa = await servesSpaShell(); + if (isSpa) { + expect(html).toContain("
"); + expect(html).toContain("/assets/"); + } else { + expect(html).toContain("__SELFTUNE_LIVE__"); + } + }); + + it("keeps the legacy dashboard available at /legacy/ when SPA is active", async () => { + if (!(await servesSpaShell())) return; + + const server = await getServer(); + const res = await fetch(`http://127.0.0.1:${server.port}/legacy/`); + expect(res.status).toBe(200); const html = await res.text(); expect(html).toContain("__SELFTUNE_LIVE__"); }); @@ -112,13 +147,15 @@ describe("dashboard-server", () => { // ---- GET /api/data ---- describe("GET /api/data", () => { it("returns 200 with JSON", async () => { - const res = await fetch(`http://localhost:${server.port}/api/data`); + const server = await getServer(); + const res = await fetch(`http://127.0.0.1:${server.port}/api/data`); expect(res.status).toBe(200); expect(res.headers.get("content-type")).toContain("application/json"); }); it("returns expected data shape", async () => { - const res = await fetch(`http://localhost:${server.port}/api/data`); + const server = await getServer(); + const res = await fetch(`http://127.0.0.1:${server.port}/api/data`); const data = await res.json(); expect(data).toHaveProperty("telemetry"); expect(data).toHaveProperty("skills"); @@ -134,7 +171,8 @@ describe("dashboard-server", () => { }); it("includes decisions in the data", async () => { - const res = await fetch(`http://localhost:${server.port}/api/data`); + const server = await getServer(); + const res = await fetch(`http://127.0.0.1:${server.port}/api/data`); const data = await res.json(); expect(data).toHaveProperty("decisions"); expect(Array.isArray(data.decisions)).toBe(true); @@ -144,8 +182,9 @@ describe("dashboard-server", () => { // ---- GET /api/events (SSE) ---- describe("GET /api/events", () => { it("returns SSE content type", async () => { + const server = await getServer(); const controller = new AbortController(); - const res = await fetch(`http://localhost:${server.port}/api/events`, { + const res = await fetch(`http://127.0.0.1:${server.port}/api/events`, { signal: controller.signal, }); expect(res.status).toBe(200); @@ -154,10 +193,11 @@ describe("dashboard-server", () => { }); it("sends initial data event", async () => { + const server = await getServer(); const controller = new AbortController(); const timeout = setTimeout(() => controller.abort(), 3000); - const res = await fetch(`http://localhost:${server.port}/api/events`, { + const res = await fetch(`http://127.0.0.1:${server.port}/api/events`, { signal: controller.signal, }); @@ -196,7 +236,8 @@ describe("dashboard-server", () => { // ---- POST /api/actions/watch ---- describe("POST /api/actions/watch", () => { it("returns JSON response", async () => { - const res = await fetch(`http://localhost:${server.port}/api/actions/watch`, { + const server = await getServer(); + const res = await fetch(`http://127.0.0.1:${server.port}/api/actions/watch`, { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ skill: "test-skill", skillPath: "/tmp/test-skill" }), @@ -214,7 +255,8 @@ describe("dashboard-server", () => { // ---- POST /api/actions/evolve ---- describe("POST /api/actions/evolve", () => { it("returns JSON response", async () => { - const res = await fetch(`http://localhost:${server.port}/api/actions/evolve`, { + const server = await getServer(); + const res = await fetch(`http://127.0.0.1:${server.port}/api/actions/evolve`, { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ skill: "test-skill", skillPath: "/tmp/test-skill" }), @@ -229,7 +271,8 @@ describe("dashboard-server", () => { // ---- POST /api/actions/rollback ---- describe("POST /api/actions/rollback", () => { it("returns JSON response with proposalId validation", async () => { - const res = await fetch(`http://localhost:${server.port}/api/actions/rollback`, { + const server = await getServer(); + const res = await fetch(`http://127.0.0.1:${server.port}/api/actions/rollback`, { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ @@ -248,8 +291,9 @@ describe("dashboard-server", () => { // ---- GET /api/evaluations/:skillName ---- describe("GET /api/evaluations/:skillName", () => { it("returns 200 with JSON array", async () => { + const server = await getServer(); const res = await fetch( - `http://localhost:${server.port}/api/evaluations/${encodeURIComponent("test-skill")}`, + `http://127.0.0.1:${server.port}/api/evaluations/${encodeURIComponent("test-skill")}`, ); expect(res.status).toBe(200); expect(res.headers.get("content-type")).toContain("application/json"); @@ -258,8 +302,9 @@ describe("dashboard-server", () => { }); it("returns entries with expected shape when data exists", async () => { + const server = await getServer(); const res = await fetch( - `http://localhost:${server.port}/api/evaluations/${encodeURIComponent("test-skill")}`, + `http://127.0.0.1:${server.port}/api/evaluations/${encodeURIComponent("test-skill")}`, ); const data = await res.json(); // May be empty if no skill_usage_log.jsonl entries match, but shape is still an array @@ -274,8 +319,9 @@ describe("dashboard-server", () => { }); it("returns empty array for unknown skill", async () => { + const server = await getServer(); const res = await fetch( - `http://localhost:${server.port}/api/evaluations/${encodeURIComponent("nonexistent-skill-xyz")}`, + `http://127.0.0.1:${server.port}/api/evaluations/${encodeURIComponent("nonexistent-skill-xyz")}`, ); expect(res.status).toBe(200); const data = await res.json(); @@ -283,8 +329,9 @@ describe("dashboard-server", () => { }); it("includes CORS headers", async () => { + const server = await getServer(); const res = await fetch( - `http://localhost:${server.port}/api/evaluations/${encodeURIComponent("test-skill")}`, + `http://127.0.0.1:${server.port}/api/evaluations/${encodeURIComponent("test-skill")}`, ); expect(res.headers.get("access-control-allow-origin")).toBe("*"); }); @@ -292,16 +339,24 @@ describe("dashboard-server", () => { // ---- 404 for unknown routes ---- describe("unknown routes", () => { - it("returns 404 for unknown paths", async () => { - const res = await fetch(`http://localhost:${server.port}/nonexistent`); - expect(res.status).toBe(404); + it("returns SPA fallback or 404 depending on served mode", async () => { + const server = await getServer(); + const res = await fetch(`http://127.0.0.1:${server.port}/nonexistent`); + if (await servesSpaShell()) { + expect(res.status).toBe(200); + const html = await res.text(); + expect(html).toContain("
"); + } else { + expect(res.status).toBe(404); + } }); }); // ---- CORS headers ---- describe("CORS", () => { it("includes CORS headers on API responses", async () => { - const res = await fetch(`http://localhost:${server.port}/api/data`); + const server = await getServer(); + const res = await fetch(`http://127.0.0.1:${server.port}/api/data`); expect(res.headers.get("access-control-allow-origin")).toBe("*"); }); }); @@ -312,7 +367,7 @@ describe("server lifecycle", () => { it("can start and stop cleanly", async () => { const s = await startDashboardServer({ port: 0, - host: "localhost", + host: "127.0.0.1", openBrowser: false, dataLoader: () => fakeData, statusLoader: () => ({ @@ -328,12 +383,12 @@ describe("server lifecycle", () => { expect(typeof s.port).toBe("number"); expect(s.port).toBeGreaterThan(0); s.stop(); - }); + }, 30000); it("exposes port after binding", async () => { const s = await startDashboardServer({ port: 0, - host: "localhost", + host: "127.0.0.1", openBrowser: false, dataLoader: () => fakeData, statusLoader: () => ({ @@ -345,21 +400,18 @@ describe("server lifecycle", () => { }), }); // Verify the server is actually responding - const res = await fetch(`http://localhost:${s.port}/api/data`); + const res = await fetch(`http://127.0.0.1:${s.port}/api/data`); expect(res.status).toBe(200); s.stop(); - }); + }, 15000); }); describe("live shell loading", () => { - let server: { server: unknown; stop: () => void; port: number }; - let dataLoaderCalls = 0; - - beforeAll(async () => { - dataLoaderCalls = 0; - server = await startDashboardServer({ + it("serves / without eagerly loading dashboard data", async () => { + let dataLoaderCalls = 0; + const server = await startDashboardServer({ port: 0, - host: "localhost", + host: "127.0.0.1", openBrowser: false, dataLoader: () => { dataLoaderCalls++; @@ -390,40 +442,38 @@ describe("live shell loading", () => { }, }), }); - }); - - afterAll(() => { - server?.stop(); - }); - it("serves / without eagerly loading dashboard data", async () => { const callsBefore = dataLoaderCalls; - const res = await fetch(`http://localhost:${server.port}/`); - const html = await res.text(); - expect(res.status).toBe(200); - expect(html).toContain("__SELFTUNE_LIVE__"); - expect(html).not.toContain('id="embedded-data"'); - expect(dataLoaderCalls).toBe(callsBefore); - }); - - it("loads dashboard data only through /api/data", async () => { - const res = await fetch(`http://localhost:${server.port}/api/data`); - expect(res.status).toBe(200); - expect(dataLoaderCalls).toBe(1); - }); + try { + const res = await fetch(`http://127.0.0.1:${server.port}/`); + const html = await res.text(); + expect(res.status).toBe(200); + const isSpa = html.includes("
") && html.includes("/assets/"); + if (isSpa) { + expect(html).toContain("
"); + } else { + expect(html).toContain("__SELFTUNE_LIVE__"); + expect(html).not.toContain('id="embedded-data"'); + } + expect(dataLoaderCalls).toBe(callsBefore); + + const dataRes = await fetch(`http://127.0.0.1:${server.port}/api/data`); + expect(dataRes.status).toBe(200); + expect(dataLoaderCalls).toBe(1); + } finally { + server.stop(); + } + }, 15000); }); describe("report loading", () => { - let server: { server: unknown; stop: () => void; port: number }; - let dataLoaderCalls = 0; - let evidenceLoaderCalls = 0; - - beforeAll(async () => { - dataLoaderCalls = 0; - evidenceLoaderCalls = 0; - server = await startDashboardServer({ + it("loads report data without touching the full dashboard loader", async () => { + let dataLoaderCalls = 0; + let evidenceLoaderCalls = 0; + + const server = await startDashboardServer({ port: 0, - host: "localhost", + host: "127.0.0.1", openBrowser: false, dataLoader: () => { dataLoaderCalls++; @@ -467,16 +517,14 @@ describe("report loading", () => { return []; }, }); - }); - - afterAll(() => { - server?.stop(); - }); - it("loads report data without touching the full dashboard loader", async () => { - const res = await fetch(`http://localhost:${server.port}/report/test-skill`); - expect(res.status).toBe(200); - expect(dataLoaderCalls).toBe(0); - expect(evidenceLoaderCalls).toBe(1); - }); + try { + const res = await fetch(`http://127.0.0.1:${server.port}/report/test-skill`); + expect(res.status).toBe(200); + expect(dataLoaderCalls).toBe(0); + expect(evidenceLoaderCalls).toBe(1); + } finally { + server.stop(); + } + }, 15000); }); diff --git a/tests/orchestrate.test.ts b/tests/orchestrate.test.ts index fd7a165..ca242da 100644 --- a/tests/orchestrate.test.ts +++ b/tests/orchestrate.test.ts @@ -56,8 +56,8 @@ function makeStatusResult(skills: SkillStatus[]): StatusResult { } const baseOptions: OrchestrateOptions = { - dryRun: true, - autoApprove: false, + dryRun: false, + approvalMode: "auto", maxSkills: 5, recentWindowHours: 48, syncForce: false, @@ -204,8 +204,8 @@ describe("orchestrate", () => { expect(result.summary.totalSkills).toBe(0); expect(result.summary.evaluated).toBe(0); expect(result.summary.skipped).toBe(0); - expect(result.summary.dryRun).toBe(true); - expect(result.summary.autoApprove).toBe(false); + expect(result.summary.dryRun).toBe(false); + expect(result.summary.approvalMode).toBe("auto"); }); test("dry-run prevents deployment even when evolve would succeed", async () => { @@ -233,7 +233,7 @@ describe("orchestrate", () => { expect(evolveDryRun).toBe(true); }); - test("auto-approve passes dryRun=false to evolve", async () => { + test("autonomous mode passes dryRun=false to evolve", async () => { let evolveDryRun: boolean | undefined; const deps = makeDeps({ computeStatus: () => @@ -254,10 +254,35 @@ describe("orchestrate", () => { }, }); - await orchestrate({ ...baseOptions, dryRun: false, autoApprove: true }, deps); + await orchestrate({ ...baseOptions, dryRun: false, approvalMode: "auto" }, deps); expect(evolveDryRun).toBe(false); }); + test("review-required mode keeps evolve in dry-run", async () => { + let evolveDryRun: boolean | undefined; + const deps = makeDeps({ + computeStatus: () => + makeStatusResult([ + makeSkill({ name: "Skill1", status: "CRITICAL", passRate: 0.2, missedQueries: 5 }), + ]), + evolve: async (opts) => { + evolveDryRun = opts.dryRun; + return { + proposal: null, + validation: null, + deployed: false, + auditEntries: [], + reason: "review required", + llmCallCount: 0, + elapsedMs: 50, + }; + }, + }); + + await orchestrate({ ...baseOptions, approvalMode: "review" }, deps); + expect(evolveDryRun).toBe(true); + }); + test("skips evolve when skill path cannot be resolved", async () => { const deps = makeDeps({ computeStatus: () => From 273bd390a91487648696ee485dabf980948059bc Mon Sep 17 00:00:00 2001 From: WellDunDun <45949032+WellDunDun@users.noreply.github.com> Date: Sat, 14 Mar 2026 17:03:56 +0300 Subject: [PATCH 04/14] Remove legacy dashboard runtime --- ARCHITECTURE.md | 7 +- CHANGELOG.md | 2 +- README.md | 2 +- ROADMAP.md | 2 +- apps/local-dashboard/HANDOFF.md | 9 +- apps/local-dashboard/package.json | 2 +- cli/selftune/dashboard-server.ts | 374 +--- cli/selftune/dashboard.ts | 240 +-- dashboard/index.html | 2113 ---------------------- docs/design-docs/sandbox-claude-code.md | 2 +- docs/design-docs/sandbox-test-harness.md | 2 +- docs/escalation-policy.md | 4 +- docs/exec-plans/tech-debt-tracker.md | 2 +- package.json | 3 +- skill/SKILL.md | 6 +- skill/Workflows/Dashboard.md | 202 +-- tests/dashboard/dashboard-server.test.ts | 458 ++--- tests/dashboard/dashboard.test.ts | 112 +- tests/sandbox/run-sandbox.ts | 87 +- 19 files changed, 391 insertions(+), 3238 deletions(-) delete mode 100644 dashboard/index.html diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index c080752..af694bc 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -44,8 +44,8 @@ cli/selftune/ ├── observability.ts Health checks (doctor command) ├── status.ts Skill health summary (status command) ├── last.ts Last session insight (last command) -├── dashboard.ts HTML dashboard builder (dashboard command) -├── dashboard-server.ts Live Bun.serve server with SSE (dashboard --serve) +├── dashboard.ts Dashboard command entry point (SPA server launcher) +├── dashboard-server.ts Bun.serve SPA + v2 API server ├── types.ts Shared interfaces (incl. SelftuneConfig) ├── constants.ts Log paths, config paths, known tools ├── utils/ Shared utilities (jsonl, transcript, logging, llm-call, schema-validator, trigger-check) @@ -100,9 +100,6 @@ apps/local-dashboard/ React SPA dashboard (Vite + TypeScript + shadcn/ui) ├── vite.config.ts Dev proxy → dashboard-server, build to dist/ └── package.json React 19, Tailwind v4, shadcn/ui, recharts -dashboard/ Legacy HTML dashboard (served at /legacy/) -└── index.html Original embedded-JSON dashboard (v1 endpoints) - templates/ Settings and config templates ├── single-skill-settings.json ├── multi-skill-settings.json diff --git a/CHANGELOG.md b/CHANGELOG.md index a821bd7..e3215e8 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -25,7 +25,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/). - Onboarding flow: full empty-state guide for first-time users (3-step setup), dismissible welcome banner for returning users (localStorage-persisted) - **SQLite v2 API endpoints** — `GET /api/v2/overview` and `GET /api/v2/skills/:name` backed by materialized SQLite queries (`getOverviewPayload()`, `getSkillReportPayload()`, `getSkillsList()`) - **SQL query optimizations** — Replaced `NOT IN` subqueries with `LEFT JOIN + IS NULL`, moved JS-side dedup to SQL `GROUP BY`, added `LIMIT 200` to unbounded evidence queries -- **SPA serving from dashboard server** — Built SPA served at `/`, legacy HTML dashboard moved to `/legacy/` +- **SPA serving from dashboard server** — Built SPA served at `/` as the supported local dashboard experience - **Source-truth-driven pipeline** — Transcripts and rollouts are now the authoritative source; `sync` rebuilds repaired overlays from source data rather than relying solely on hook-time capture - **Telemetry contract package** — `@selftune/telemetry-contract` workspace package with canonical schema types, validators, versioning, metadata, and golden fixture tests - **Test split** — `make test-fast` / `make test-slow` and `bun run test:fast` / `bun run test:slow` for faster development feedback loop diff --git a/README.md b/README.md index 56de090..ab25e5c 100644 --- a/README.md +++ b/README.md @@ -87,7 +87,7 @@ A continuous feedback loop that makes your skills learn and adapt. Automatically - **Per-stage model control** — `--validation-model`, `--proposal-model`, and `--gate-model` flags give fine-grained control over which model runs each evolution stage. - **Auto-activation system** — Hooks detect when selftune should run and suggest actions - **Enforcement guardrails** — Blocks SKILL.md edits on monitored skills unless `selftune watch` has been run -- **React SPA dashboard** — `selftune dashboard` serves a React SPA with skill health grid, per-skill drilldown, evidence viewer, evolution timeline, dark/light theming, and SQLite-backed v2 API (legacy dashboard at `/legacy/`) +- **React SPA dashboard** — `selftune dashboard` serves a React SPA with skill health grid, per-skill drilldown, evidence viewer, evolution timeline, dark/light theming, and SQLite-backed v2 API - **Evolution memory** — Persists context, plans, and decisions across context resets - **4 specialized agents** — Diagnosis analyst, pattern analyst, evolution reviewer, integration guide - **Sandbox test harness** — Comprehensive automated test coverage, including devcontainer-based LLM testing diff --git a/ROADMAP.md b/ROADMAP.md index 40abd7c..d4cf915 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -16,7 +16,7 @@ - Per-skill drilldown with evidence viewer, evolution timeline - SQLite v2 API endpoints (`/api/v2/overview`, `/api/v2/skills/:name`) - Dark/light theme toggle with selftune branding - - SPA served at `/`, legacy HTML dashboard at `/legacy/` + - SPA served at `/` as the supported local dashboard ## In Progress - Multi-agent sandbox expansion diff --git a/apps/local-dashboard/HANDOFF.md b/apps/local-dashboard/HANDOFF.md index 6312396..e5f6ae8 100644 --- a/apps/local-dashboard/HANDOFF.md +++ b/apps/local-dashboard/HANDOFF.md @@ -35,7 +35,7 @@ bun run dev # Or run manually: # Terminal 1: Start the dashboard server -selftune dashboard --port 7888 +selftune dashboard --port 7888 --no-open # Terminal 2: Start the SPA dev server (proxies /api to port 7888) cd apps/local-dashboard @@ -47,7 +47,7 @@ bunx vite ## What was rebased / changed - **SPA types**: Rewritten to match `queries.ts` payload shapes (`OverviewResponse`, `SkillReportResponse`, `SkillSummary`, `EvidenceEntry`) -- **API layer**: Now calls `/api/v2/overview` and `/api/v2/skills/:name` instead of `/api/data` + `/api/evaluations/:name` +- **API layer**: Calls `/api/v2/overview` and `/api/v2/skills/:name` - **SSE removed**: Replaced with 15s polling (SQLite reads are cheap, SSE was complex) - **Overview page**: Uses `SkillSummary[]` from `getSkillsList()` for skill cards (pre-aggregated pass rate, check count, sessions) - **Skill report page**: Single fetch to v2 endpoint instead of parallel overview + evaluations fetch. Shows evidence entries, evolution audit history per skill @@ -67,13 +67,12 @@ bunx vite ## What still depends on old dashboard code -- The old v1 endpoints (`/api/data`, `/api/events`, `/api/evaluations/:name`) still work and are used by the legacy `dashboard/index.html` -- Badge endpoints (`/badge/:name`) and report HTML endpoints (`/report/:name`) use the old `computeStatus` + JSONL reader path +- Badge endpoints (`/badge/:name`) and report HTML endpoints (`/report/:name`) still use the status/evidence JSONL path rather than SQLite-backed view models - Action endpoints (`/api/actions/*`) are unchanged ## What remains before this can become default -1. ~~**Serve built SPA from dashboard-server**~~: Done — `/` serves SPA, old dashboard at `/legacy/` +1. ~~**Serve built SPA from dashboard-server**~~: Done — `/` serves the SPA 2. ~~**Production build**~~: Done — `bun run build:dashboard` in root package.json 3. **Regression detection**: The SQLite layer doesn't compute regression detection yet — `deriveStatus()` currently only uses pass rate + check count. Add a `regression_detected` column to skill summaries when the monitoring snapshot computation moves to SQLite. 4. **Monitoring snapshot migration**: Move `computeMonitoringSnapshot()` logic into the SQLite materializer or a query helper (window sessions, false negative rate, baseline comparison) diff --git a/apps/local-dashboard/package.json b/apps/local-dashboard/package.json index 06931d8..a6520ec 100644 --- a/apps/local-dashboard/package.json +++ b/apps/local-dashboard/package.json @@ -4,7 +4,7 @@ "version": "0.1.0", "type": "module", "scripts": { - "dev": "concurrently \"cd ../.. && bun run cli/selftune/index.ts dashboard --serve --port 7888\" \"vite\"", + "dev": "concurrently \"cd ../.. && bun run cli/selftune/index.ts dashboard --port 7888 --no-open\" \"vite\"", "build": "vite build", "preview": "vite preview", "typecheck": "tsc --noEmit" diff --git a/cli/selftune/dashboard-server.ts b/cli/selftune/dashboard-server.ts index bcbc97c..fef2c4f 100644 --- a/cli/selftune/dashboard-server.ts +++ b/cli/selftune/dashboard-server.ts @@ -1,16 +1,16 @@ /** - * selftune dashboard server — Live Bun.serve HTTP server with SSE, data API, - * and action endpoints for the interactive dashboard. + * selftune dashboard server — Bun.serve HTTP server for the SPA dashboard, + * skill report HTML, badges, and action endpoints. * * Endpoints: - * GET / — Serve dashboard HTML shell + live mode flag - * GET /api/data — JSON endpoint returning current telemetry data - * GET /api/events — SSE stream sending data updates every 5 seconds + * GET / — Serve dashboard SPA shell + * GET /api/v2/overview — SQLite-backed overview payload + * GET /api/v2/skills/:name — SQLite-backed per-skill report * POST /api/actions/watch — Trigger `selftune watch` for a skill * POST /api/actions/evolve — Trigger `selftune evolve` for a skill * POST /api/actions/rollback — Trigger `selftune rollback` for a skill - * GET /api/v2/overview — SQLite-backed overview payload - * GET /api/v2/skills/:name — SQLite-backed per-skill report + * GET /badge/:name — Skill health badge + * GET /report/:name — Skill health report HTML */ import type { Database } from "bun:sqlite"; @@ -21,7 +21,7 @@ import { findSkillBadgeData } from "./badge/badge-data.js"; import type { BadgeFormat } from "./badge/badge-svg.js"; import { formatBadgeOutput, renderBadgeSvg } from "./badge/badge-svg.js"; import { EVOLUTION_AUDIT_LOG, QUERY_LOG, TELEMETRY_LOG } from "./constants.js"; -import { getLastDeployedProposal } from "./evolution/audit.js"; +import type { OverviewResponse, SkillReportResponse } from "./dashboard-contract.js"; import { readEvidenceTrail } from "./evolution/evidence.js"; import { openDb } from "./localdb/db.js"; import { materializeIncremental } from "./localdb/materialize.js"; @@ -31,37 +31,29 @@ import { getSkillReportPayload, getSkillsList, } from "./localdb/queries.js"; -import { readDecisions } from "./memory/writer.js"; -import { computeMonitoringSnapshot } from "./monitoring/watch.js"; import { doctor } from "./observability.js"; import type { StatusResult } from "./status.js"; -import { computeStatus, DEFAULT_WINDOW_SESSIONS } from "./status.js"; +import { computeStatus } from "./status.js"; import type { EvolutionAuditEntry, EvolutionEvidenceEntry, QueryLogRecord, SessionTelemetryRecord, - SkillUsageRecord, } from "./types.js"; import { readJsonl } from "./utils/jsonl.js"; -import { - filterActionableQueryRecords, - filterActionableSkillUsageRecords, -} from "./utils/query-filter.js"; import { readEffectiveSkillUsageRecords } from "./utils/skill-log.js"; export interface DashboardServerOptions { port?: number; host?: string; openBrowser?: boolean; - dataLoader?: () => DashboardData; statusLoader?: () => StatusResult; evidenceLoader?: () => EvolutionEvidenceEntry[]; + overviewLoader?: () => OverviewResponse; + skillReportLoader?: (skillName: string) => SkillReportResponse | null; actionRunner?: typeof runAction; } -const LIVE_CACHE_TTL_MS = 30_000; - /** Read selftune version from package.json once at startup */ let selftuneVersion = "unknown"; try { @@ -71,60 +63,6 @@ try { // fallback already set } -interface DashboardData { - telemetry: SessionTelemetryRecord[]; - skills: SkillUsageRecord[]; - queries: QueryLogRecord[]; - evolution: EvolutionAuditEntry[]; - evidence: EvolutionEvidenceEntry[]; - decisions: import("./types.js").DecisionRecord[]; - computed: { - snapshots: Record>; - unmatched: Array<{ timestamp: string; session_id: string; query: string }>; - pendingProposals: EvolutionAuditEntry[]; - }; -} - -interface LiveDashboardPayload { - telemetry: Array< - Pick< - SessionTelemetryRecord, - "timestamp" | "session_id" | "skills_triggered" | "errors_encountered" | "total_tool_calls" - > - >; - skills: Array< - Pick< - SkillUsageRecord, - "timestamp" | "session_id" | "skill_name" | "skill_path" | "query" | "triggered" | "source" - > - >; - queries: Array>; - evolution: Array>; - evidence: Array>; - decisions: DashboardData["decisions"]; - computed: DashboardData["computed"] & { unmatched_count: number }; - counts: { - telemetry: number; - skills: number; - queries: number; - evolution: number; - evidence: number; - decisions: number; - }; -} - -function findViewerHTML(): string { - const candidates = [ - join(dirname(import.meta.dir), "..", "dashboard", "index.html"), - join(dirname(import.meta.dir), "dashboard", "index.html"), - resolve("dashboard", "index.html"), - ]; - for (const c of candidates) { - if (existsSync(c)) return c; - } - throw new Error("Could not find dashboard/index.html. Ensure it exists in the selftune repo."); -} - function findSpaDir(): string | null { const candidates = [ join(dirname(import.meta.dir), "..", "apps", "local-dashboard", "dist"), @@ -150,73 +88,6 @@ const MIME_TYPES: Record = { ".ico": "image/x-icon", }; -function collectData(): DashboardData { - const telemetry = readJsonl(TELEMETRY_LOG); - const skills = filterActionableSkillUsageRecords(readEffectiveSkillUsageRecords()); - const queries = readJsonl(QUERY_LOG); - const actionableQueries = filterActionableQueryRecords(queries); - const evolution = readJsonl(EVOLUTION_AUDIT_LOG); - const evidence = readEvidenceTrail(); - const decisions = readDecisions(); - - // Compute per-skill monitoring snapshots - const skillNames = [...new Set(skills.map((r) => r.skill_name))]; - const snapshots: Record> = {}; - for (const name of skillNames) { - const lastDeployed = getLastDeployedProposal(name); - const baselinePassRate = lastDeployed?.eval_snapshot?.pass_rate ?? 0.5; - snapshots[name] = computeMonitoringSnapshot( - name, - telemetry, - skills, - actionableQueries, - DEFAULT_WINDOW_SESSIONS, - baselinePassRate, - ); - } - - // Compute unmatched queries - const triggeredQueries = new Set( - skills - .filter((r) => r.triggered && typeof r.query === "string") - .map((r) => r.query.toLowerCase().trim()), - ); - const unmatched = actionableQueries - .filter((q) => !triggeredQueries.has(q.query.toLowerCase().trim())) - .map((q) => ({ - timestamp: q.timestamp, - session_id: q.session_id, - query: q.query, - })); - - // Compute pending proposals (reuse already-loaded evolution entries) - const proposalStatus: Record = {}; - for (const e of evolution) { - if (!proposalStatus[e.proposal_id]) proposalStatus[e.proposal_id] = []; - proposalStatus[e.proposal_id].push(e.action); - } - const terminalActions = new Set(["deployed", "rejected", "rolled_back"]); - const seenProposals = new Set(); - const pendingProposals = evolution.filter((e) => { - if (e.action !== "created" && e.action !== "validated") return false; - if (seenProposals.has(e.proposal_id)) return false; - const actions = proposalStatus[e.proposal_id] || []; - const isPending = !actions.some((a: string) => terminalActions.has(a)); - if (isPending) seenProposals.add(e.proposal_id); - return isPending; - }); - - return { - telemetry, - skills, - queries: actionableQueries, - evolution, - evidence, - decisions, - computed: { snapshots, unmatched, pendingProposals }, - }; -} - function computeStatusFromLogs(): StatusResult { const telemetry = readJsonl(TELEMETRY_LOG); const skillRecords = readEffectiveSkillUsageRecords(); @@ -226,56 +97,6 @@ function computeStatusFromLogs(): StatusResult { return computeStatus(telemetry, skillRecords, queryRecords, auditEntries, doctorResult); } -function buildLivePayload(data: DashboardData): LiveDashboardPayload { - return { - telemetry: data.telemetry.map((record) => ({ - timestamp: record.timestamp, - session_id: record.session_id, - skills_triggered: record.skills_triggered, - errors_encountered: record.errors_encountered, - total_tool_calls: record.total_tool_calls, - })), - skills: data.skills.map((record) => ({ - timestamp: record.timestamp, - session_id: record.session_id, - skill_name: record.skill_name, - skill_path: record.skill_path, - query: record.query, - triggered: record.triggered, - source: record.source, - })), - queries: [], - evolution: data.evolution.map((record) => ({ - timestamp: record.timestamp, - proposal_id: record.proposal_id, - action: record.action, - details: record.details, - })), - evidence: [], - decisions: data.decisions, - computed: { - ...data.computed, - unmatched: data.computed.unmatched.slice(0, 500), - unmatched_count: data.computed.unmatched.length, - }, - counts: { - telemetry: data.telemetry.length, - skills: data.skills.length, - queries: data.queries.length, - evolution: data.evolution.length, - evidence: data.evidence.length, - decisions: data.decisions.length, - }, - }; -} - -function buildLiveHTML(): string { - const template = readFileSync(findViewerHTML(), "utf-8"); - const liveFlag = ""; - - return template.replace("", `${liveFlag}\n`); -} - interface MergedEvidenceEntry { proposal_id: string; target: string; @@ -584,9 +405,10 @@ export async function startDashboardServer( const port = options?.port ?? 3141; const hostname = options?.host ?? "localhost"; const openBrowser = options?.openBrowser ?? true; - const getDashboardData = options?.dataLoader ?? collectData; const getStatusResult = options?.statusLoader ?? computeStatusFromLogs; const getEvidenceEntries = options?.evidenceLoader ?? readEvidenceTrail; + const getOverviewResponse = options?.overviewLoader; + const getSkillReportResponse = options?.skillReportLoader; const executeAction = options?.actionRunner ?? runAction; // -- SPA serving ------------------------------------------------------------- @@ -594,21 +416,26 @@ export async function startDashboardServer( if (spaDir) { console.log(`SPA found at ${spaDir}, serving as default dashboard`); } else { - console.log("SPA build not found, serving legacy dashboard at /"); + console.warn( + "SPA build not found. Run `bun run build:dashboard` before using `selftune dashboard`.", + ); } // -- SQLite v2 data layer --------------------------------------------------- let db: Database | null = null; let lastV2MaterializedAt = 0; let lastV2RefreshAttemptAt = 0; - try { - db = openDb(); - materializeIncremental(db); - lastV2MaterializedAt = Date.now(); - } catch (error: unknown) { - const message = error instanceof Error ? error.message : String(error); - console.error(`V2 dashboard data unavailable: ${message}`); - // Continue serving; refreshV2Data will retry on demand. + const needsDb = !getOverviewResponse || !getSkillReportResponse; + if (needsDb) { + try { + db = openDb(); + materializeIncremental(db); + lastV2MaterializedAt = Date.now(); + } catch (error: unknown) { + const message = error instanceof Error ? error.message : String(error); + console.error(`V2 dashboard data unavailable: ${message}`); + // Continue serving; refreshV2Data will retry on demand. + } } const V2_MATERIALIZE_TTL_MS = 15_000; @@ -628,38 +455,15 @@ export async function startDashboardServer( } } - const sseClients = new Set(); - let cachedDashboardData: DashboardData | null = null; - let cachedLivePayload: LiveDashboardPayload | null = null; let cachedStatusResult: StatusResult | null = null; - let lastDataCacheRefreshAt = 0; let lastStatusCacheRefreshAt = 0; - let dataRefreshPromise: Promise | null = null; let statusRefreshPromise: Promise | null = null; - async function refreshLiveCache(force = false): Promise { - const cacheIsFresh = - cachedDashboardData !== null && Date.now() - lastDataCacheRefreshAt < LIVE_CACHE_TTL_MS; - if (!force && cacheIsFresh) return; - if (dataRefreshPromise) return dataRefreshPromise; - - dataRefreshPromise = (async () => { - const data = getDashboardData(); - cachedDashboardData = data; - cachedLivePayload = buildLivePayload(data); - lastDataCacheRefreshAt = Date.now(); - })(); - - try { - await dataRefreshPromise; - } finally { - dataRefreshPromise = null; - } - } + const STATUS_CACHE_TTL_MS = 30_000; async function refreshStatusCache(force = false): Promise { const cacheIsFresh = - cachedStatusResult !== null && Date.now() - lastStatusCacheRefreshAt < LIVE_CACHE_TTL_MS; + cachedStatusResult !== null && Date.now() - lastStatusCacheRefreshAt < STATUS_CACHE_TTL_MS; if (!force && cacheIsFresh) return; if (statusRefreshPromise) return statusRefreshPromise; @@ -675,15 +479,6 @@ export async function startDashboardServer( } } - async function getCachedLivePayload(): Promise { - if (!cachedLivePayload) { - await refreshLiveCache(true); - } else { - void refreshLiveCache(false); - } - return cachedLivePayload as LiveDashboardPayload; - } - async function getCachedStatusResult(): Promise { if (!cachedStatusResult) { await refreshStatusCache(true); @@ -726,7 +521,7 @@ export async function startDashboardServer( return new Response("Not Found", { status: 404, headers: corsHeaders() }); } - // ---- GET / ---- Serve SPA (or legacy fallback) + // ---- GET / ---- Serve SPA shell if (url.pathname === "/" && req.method === "GET") { if (spaDir) { const html = await Bun.file(join(spaDir, "index.html")).text(); @@ -734,72 +529,9 @@ export async function startDashboardServer( headers: { "Content-Type": "text/html; charset=utf-8", ...corsHeaders() }, }); } - const html = buildLiveHTML(); - return new Response(html, { - headers: { "Content-Type": "text/html; charset=utf-8", ...corsHeaders() }, - }); - } - - // ---- GET /legacy/ ---- Serve old dashboard HTML - if (url.pathname === "/legacy/" && req.method === "GET") { - const html = buildLiveHTML(); - return new Response(html, { - headers: { "Content-Type": "text/html; charset=utf-8", ...corsHeaders() }, - }); - } - - // ---- GET /api/data ---- JSON data endpoint - if (url.pathname === "/api/data" && req.method === "GET") { - const payload = await getCachedLivePayload(); - return Response.json(payload, { headers: corsHeaders() }); - } - - // ---- GET /api/events ---- SSE stream - if (url.pathname === "/api/events" && req.method === "GET") { - const stream = new ReadableStream({ - async start(controller) { - sseClients.add(controller); - - // Send initial data immediately - const initialPayload = await getCachedLivePayload(); - const payload = `event: data\ndata: ${JSON.stringify(initialPayload)}\n\n`; - controller.enqueue(new TextEncoder().encode(payload)); - - // Set up periodic updates every 5 seconds - const interval = setInterval(async () => { - try { - const freshPayload = await getCachedLivePayload(); - const msg = `event: data\ndata: ${JSON.stringify(freshPayload)}\n\n`; - controller.enqueue(new TextEncoder().encode(msg)); - } catch { - clearInterval(interval); - sseClients.delete(controller); - } - }, 5000); - - // Clean up when client disconnects - req.signal.addEventListener("abort", () => { - clearInterval(interval); - sseClients.delete(controller); - try { - controller.close(); - } catch { - // already closed - } - }); - }, - cancel() { - // Stream cancelled by client - }, - }); - - return new Response(stream, { - headers: { - "Content-Type": "text/event-stream", - "Cache-Control": "no-cache", - Connection: "keep-alive", - ...corsHeaders(), - }, + return new Response("Dashboard build not found. Run `bun run build:dashboard` first.", { + status: 503, + headers: { "Content-Type": "text/plain; charset=utf-8", ...corsHeaders() }, }); } @@ -946,25 +678,11 @@ export async function startDashboardServer( }); } - // ---- GET /api/evaluations/:skillName ---- - if (url.pathname.startsWith("/api/evaluations/") && req.method === "GET") { - const skillName = decodeURIComponent(url.pathname.slice("/api/evaluations/".length)); - const skills = readEffectiveSkillUsageRecords(); - const filtered = skills - .filter((r) => r.skill_name === skillName) - .map((r) => ({ - timestamp: r.timestamp, - session_id: r.session_id, - query: r.query, - skill_name: r.skill_name, - triggered: r.triggered, - source: r.source ?? null, - })); - return Response.json(filtered, { headers: corsHeaders() }); - } - // ---- GET /api/v2/overview ---- SQLite-backed overview if (url.pathname === "/api/v2/overview" && req.method === "GET") { + if (getOverviewResponse) { + return Response.json(getOverviewResponse(), { headers: corsHeaders() }); + } if (!db) { return Response.json( { error: "V2 data unavailable" }, @@ -982,13 +700,23 @@ export async function startDashboardServer( // ---- GET /api/v2/skills/:name ---- SQLite-backed skill report if (url.pathname.startsWith("/api/v2/skills/") && req.method === "GET") { + const skillName = decodeURIComponent(url.pathname.slice("/api/v2/skills/".length)); + if (getSkillReportResponse) { + const report = getSkillReportResponse(skillName); + if (!report) { + return Response.json( + { error: "Skill not found" }, + { status: 404, headers: corsHeaders() }, + ); + } + return Response.json(report, { headers: corsHeaders() }); + } if (!db) { return Response.json( { error: "V2 data unavailable" }, { status: 503, headers: corsHeaders() }, ); } - const skillName = decodeURIComponent(url.pathname.slice("/api/v2/skills/".length)); refreshV2Data(); const report = getSkillReportPayload(db, skillName); @@ -1187,14 +915,6 @@ export async function startDashboardServer( // Graceful shutdown const shutdownHandler = () => { - for (const client of sseClients) { - try { - client.close(); - } catch { - // already closed - } - } - sseClients.clear(); db?.close(); server.stop(); }; diff --git a/cli/selftune/dashboard.ts b/cli/selftune/dashboard.ts index aedc5c1..80ac299 100644 --- a/cli/selftune/dashboard.ts +++ b/cli/selftune/dashboard.ts @@ -1,136 +1,12 @@ /** - * selftune dashboard — Exports JSONL data into a standalone HTML viewer. + * selftune dashboard — Start the local React SPA dashboard server. * * Usage: - * selftune dashboard — Open dashboard in default browser - * selftune dashboard --export — Export data-embedded HTML to stdout - * selftune dashboard --out FILE — Write data-embedded HTML to FILE - * selftune dashboard --serve — Start live dashboard server (default port 3141) - * selftune dashboard --serve --port 8080 — Start on custom port + * selftune dashboard — Start server on port 3141 and open browser + * selftune dashboard --port 8080 — Start on custom port + * selftune dashboard --serve — Deprecated alias for the default behavior */ -import { existsSync, mkdirSync, readFileSync, writeFileSync } from "node:fs"; -import { homedir } from "node:os"; -import { dirname, join, resolve } from "node:path"; -import { EVOLUTION_AUDIT_LOG, QUERY_LOG, SKILL_LOG, TELEMETRY_LOG } from "./constants.js"; -import { getLastDeployedProposal, readAuditTrail } from "./evolution/audit.js"; -import { readEvidenceTrail } from "./evolution/evidence.js"; -import { computeMonitoringSnapshot } from "./monitoring/watch.js"; -import { DEFAULT_WINDOW_SESSIONS } from "./status.js"; -import type { EvolutionAuditEntry, QueryLogRecord, SessionTelemetryRecord } from "./types.js"; -import { escapeJsonForHtmlScript } from "./utils/html.js"; -import { readJsonl } from "./utils/jsonl.js"; -import { - filterActionableQueryRecords, - filterActionableSkillUsageRecords, -} from "./utils/query-filter.js"; -import { readEffectiveSkillUsageRecords } from "./utils/skill-log.js"; - -function findViewerHTML(): string { - // Try relative to this module first (works for both dev and installed) - const candidates = [ - join(dirname(import.meta.dir), "..", "dashboard", "index.html"), - join(dirname(import.meta.dir), "dashboard", "index.html"), - resolve("dashboard", "index.html"), - ]; - for (const c of candidates) { - if (existsSync(c)) return c; - } - throw new Error("Could not find dashboard/index.html. Ensure it exists in the selftune repo."); -} - -function buildEmbeddedHTML(): string { - const template = readFileSync(findViewerHTML(), "utf-8"); - - const telemetry = readJsonl(TELEMETRY_LOG); - const skills = filterActionableSkillUsageRecords(readEffectiveSkillUsageRecords()); - const queries = readJsonl(QUERY_LOG); - const actionableQueries = filterActionableQueryRecords(queries); - const evolution = readJsonl(EVOLUTION_AUDIT_LOG); - const evidence = readEvidenceTrail(); - - const totalRecords = - telemetry.length + skills.length + actionableQueries.length + evolution.length; - - if (totalRecords === 0) { - console.error("No log data found. Run some sessions first."); - console.error(` Checked: ${TELEMETRY_LOG}`); - console.error(` ${SKILL_LOG}`); - console.error(` ${QUERY_LOG}`); - console.error(` ${EVOLUTION_AUDIT_LOG}`); - process.exit(1); - } - - // Compute per-skill monitoring snapshots - const skillNames = [...new Set(skills.map((r) => r.skill_name))]; - const snapshots: Record> = {}; - for (const name of skillNames) { - const lastDeployed = getLastDeployedProposal(name); - const baselinePassRate = lastDeployed?.eval_snapshot?.pass_rate ?? 0.5; - snapshots[name] = computeMonitoringSnapshot( - name, - telemetry, - skills, - actionableQueries, - DEFAULT_WINDOW_SESSIONS, - baselinePassRate, - ); - } - - // Compute unmatched queries - const triggeredQueries = new Set( - skills - .filter((r) => r.triggered && typeof r.query === "string") - .map((r) => r.query.toLowerCase().trim()), - ); - const unmatched = actionableQueries - .filter((q) => !triggeredQueries.has(q.query.toLowerCase().trim())) - .map((q) => ({ - timestamp: q.timestamp, - session_id: q.session_id, - query: q.query, - })); - - // Compute pending proposals - const auditTrail = readAuditTrail(); - const proposalStatus: Record = {}; - for (const e of auditTrail) { - if (!proposalStatus[e.proposal_id]) proposalStatus[e.proposal_id] = []; - proposalStatus[e.proposal_id].push(e.action); - } - // Deduplicate by proposal_id: one entry per pending proposal - const terminalActions = new Set(["deployed", "rejected", "rolled_back"]); - const seenProposals = new Set(); - const pendingProposals = auditTrail.filter((e) => { - if (e.action !== "created" && e.action !== "validated") return false; - if (seenProposals.has(e.proposal_id)) return false; - const actions = proposalStatus[e.proposal_id] || []; - const isPending = !actions.some((a: string) => terminalActions.has(a)); - if (isPending) seenProposals.add(e.proposal_id); - return isPending; - }); - - const data = { - telemetry, - skills, - queries: actionableQueries, - evolution, - evidence, - computed: { - snapshots, - unmatched, - pendingProposals, - }, - }; - - // Inject embedded data right before - // Escape the full JSON payload for safe embedding inside a script tag. - const safeJson = escapeJsonForHtmlScript(data); - const encodedJson = Buffer.from(safeJson, "utf8").toString("base64"); - const dataScript = ``; - return template.replace("", `${dataScript}\n`); -} - export async function cliMain(): Promise { const args = process.argv.slice(2); @@ -138,84 +14,50 @@ export async function cliMain(): Promise { console.log(`selftune dashboard — Visual data dashboard Usage: - selftune dashboard Open dashboard in default browser - selftune dashboard --export Export data-embedded HTML to stdout - selftune dashboard --out FILE Write data-embedded HTML to FILE - selftune dashboard --serve Start live dashboard server (port 3141) - selftune dashboard --serve --port 8080 Start on custom port`); + selftune dashboard Start dashboard server (port 3141) + selftune dashboard --port 8080 Start on custom port + selftune dashboard --serve Deprecated alias for default behavior + selftune dashboard --no-open Start server without opening browser`); process.exit(0); } - if (args.includes("--serve")) { - const portIdx = args.indexOf("--port"); - let port: number | undefined; - if (portIdx !== -1) { - const parsed = Number.parseInt(args[portIdx + 1], 10); - if (!Number.isInteger(parsed) || parsed < 1 || parsed > 65535) { - console.error( - `Invalid port "${args[portIdx + 1]}": must be an integer between 1 and 65535.`, - ); - process.exit(1); - } - port = parsed; - } - const { startDashboardServer } = await import("./dashboard-server.js"); - const { stop } = await startDashboardServer({ port, openBrowser: true }); - await new Promise((resolve) => { - let closed = false; - const keepAlive = setInterval(() => {}, 1 << 30); - const shutdown = () => { - if (closed) return; - closed = true; - clearInterval(keepAlive); - stop(); - resolve(); - }; - process.on("SIGINT", shutdown); - process.on("SIGTERM", shutdown); - }); - return; - } - - if (args.includes("--export")) { - process.stdout.write(buildEmbeddedHTML()); - return; + if (args.includes("--export") || args.includes("--out")) { + console.error("Legacy dashboard export was removed."); + console.error( + "Use `selftune dashboard` to run the SPA locally, then share a route or screenshot instead.", + ); + process.exit(1); } - const outIdx = args.indexOf("--out"); - if (outIdx !== -1) { - const outPath = args[outIdx + 1]; - if (!outPath) { - console.error("--out requires a file path argument"); + const portIdx = args.indexOf("--port"); + let port: number | undefined; + if (portIdx !== -1) { + const parsed = Number.parseInt(args[portIdx + 1], 10); + if (!Number.isInteger(parsed) || parsed < 1 || parsed > 65535) { + console.error(`Invalid port "${args[portIdx + 1]}": must be an integer between 1 and 65535.`); process.exit(1); } - const html = buildEmbeddedHTML(); - writeFileSync(outPath, html, "utf-8"); - console.log(`Dashboard written to ${outPath}`); - return; - } - - // Default: write to temp file and open in browser - const tmpDir = join(homedir(), ".selftune"); - if (!existsSync(tmpDir)) { - mkdirSync(tmpDir, { recursive: true }); + port = parsed; } - const tmpPath = join(tmpDir, "dashboard.html"); - const html = buildEmbeddedHTML(); - writeFileSync(tmpPath, html, "utf-8"); - console.log(`Dashboard saved to ${tmpPath}`); - console.log("Opening in browser..."); - - try { - const platform = process.platform; - const cmd = platform === "darwin" ? "open" : platform === "linux" ? "xdg-open" : null; - if (!cmd) throw new Error("Unsupported platform"); - const proc = Bun.spawn([cmd, tmpPath], { stdio: ["ignore", "ignore", "ignore"] }); - await proc.exited; - if (proc.exitCode !== 0) throw new Error(`Failed to launch ${cmd}`); - } catch { - console.log(`Open manually: file://${tmpPath}`); - } - process.exit(0); + if (args.includes("--serve")) { + console.warn("`selftune dashboard --serve` is deprecated; use `selftune dashboard` instead."); + } + + const openBrowser = !args.includes("--no-open"); + const { startDashboardServer } = await import("./dashboard-server.js"); + const { stop } = await startDashboardServer({ port, openBrowser }); + await new Promise((resolve) => { + let closed = false; + const keepAlive = setInterval(() => {}, 1 << 30); + const shutdown = () => { + if (closed) return; + closed = true; + clearInterval(keepAlive); + stop(); + resolve(); + }; + process.on("SIGINT", shutdown); + process.on("SIGTERM", shutdown); + }); } diff --git a/dashboard/index.html b/dashboard/index.html deleted file mode 100644 index 15af075..0000000 --- a/dashboard/index.html +++ /dev/null @@ -1,2113 +0,0 @@ - - - - - - selftune — Dashboard - - - - - - -
-
- -

selftune

- v0.1.4 -
-
Drop log files to get started
-
- - -
- -
-
-

Overview

-

Local snapshot of skill health, regressions, unmatched queries, and recent evolution activity.

-
-
Static exportOpen a built dashboard or run live mode for updates.
-
- - -
-

No embedded dashboard data found

-

Drag & drop JSONL log files here, or click to browse.
- This fallback stays local to your machine and lets you inspect raw exports.

-
- session_telemetry_log.jsonl - skill_usage_log.jsonl - all_queries_log.jsonl - evolution_audit_log.jsonl - evolution_evidence_log.jsonl -
- -
- - - - - - - - - - -
-
-
- Skill Details - Select a skill to inspect pass rate, missed prompts, and evolution history. -
- -
-
-
-
-

Pass Rate Over Time

-
- - - - -
-
-
-
-
-

Missed Queries

-
- - - -
TimestampSessionQuery
-
-
-
-

Evolution History

-
-
No evolution history
-
-
-
-

Description Versions

-
-
No proposal artifacts recorded
-
-
-
-

Validation Evidence

-
- - - -
QueryExpectedBeforeAfterDelta
-
-
-
-

Sessions

-
- - - -
TimestampToolsSkillsErrors
-
-
-
-

Evaluation Feed

-
- - - -
TimeQueryTriggeredType
-
-
-
-

Invocation Breakdown

-
-
-
-
- -
- - - - - - - - - - - -
- -
- - - - - diff --git a/docs/design-docs/sandbox-claude-code.md b/docs/design-docs/sandbox-claude-code.md index f9f7145..21afaa8 100644 --- a/docs/design-docs/sandbox-claude-code.md +++ b/docs/design-docs/sandbox-claude-code.md @@ -23,7 +23,7 @@ Claude Code-specific sandbox configuration, tests, and Docker container. See [sa | `evals --skill frontend-design` | 0 positives (correctly identifies undertriggering) | | `status` | Colored table with per-skill health | | `last` | Latest session insight with unmatched queries | -| `dashboard --export` | Standalone HTML with embedded data | +| `dashboard --port --no-open` | Starts the SPA dashboard server and responds on HTTP | | `contribute --preview` | Sanitized contribution bundle | | Hook: prompt-log | Record appended to all_queries_log.jsonl | | Hook: skill-eval | Record appended to skill_usage_log.jsonl | diff --git a/docs/design-docs/sandbox-test-harness.md b/docs/design-docs/sandbox-test-harness.md index 7aba544..f021fb7 100644 --- a/docs/design-docs/sandbox-test-harness.md +++ b/docs/design-docs/sandbox-test-harness.md @@ -30,7 +30,7 @@ selftune had 499 unit tests covering individual functions, but zero integration | `evals --skill frontend-design` | 0 positives (correctly identifies undertriggering) | | `status` | Colored table with per-skill health | | `last` | Latest session insight with unmatched queries | -| `dashboard --export` | Standalone HTML with embedded data | +| `dashboard --port --no-open` | Starts the SPA dashboard server and responds on HTTP | | `contribute --preview` | Sanitized contribution bundle | | Hook: prompt-log | Record appended to all_queries_log.jsonl | | Hook: skill-eval | Record appended to skill_usage_log.jsonl | diff --git a/docs/escalation-policy.md b/docs/escalation-policy.md index ce20be0..30930cc 100644 --- a/docs/escalation-policy.md +++ b/docs/escalation-policy.md @@ -52,8 +52,8 @@ Clear criteria for when agents proceed autonomously vs. when to involve a human. - Modifying the SKILL.md routing table (affects which workflow agents load) - Changing `computeStatus` logic in `status.ts` (affects skill health reporting) - Changing `computeLastInsight` logic in `last.ts` (affects session insight accuracy) -- Modifying dashboard data schema in `dashboard.ts` (breaks `dashboard/index.html` rendering) -- Changing the `dashboard/index.html` embedded data contract (must match `dashboard.ts` output) +- Modifying the dashboard response contract in `dashboard-contract.ts` +- Changing SQLite-backed dashboard query shapes in `cli/selftune/localdb/queries.ts` - Modifying activation rules configuration - Changing agent assignment logic - Updating dashboard server endpoints or action handlers diff --git a/docs/exec-plans/tech-debt-tracker.md b/docs/exec-plans/tech-debt-tracker.md index 4badcf7..790fa3c 100644 --- a/docs/exec-plans/tech-debt-tracker.md +++ b/docs/exec-plans/tech-debt-tracker.md @@ -17,7 +17,7 @@ Track known technical debt with priority and ownership. | TD-009 | Add evolution/monitoring to lint-architecture.ts import rules | Infra | Medium | — | Closed | 2026-02-28 | 2026-02-28 | | TD-010 | `cli/selftune/utils/logging.ts` has no test file — violates golden-principles testing rule | Testing | Medium | — | Open | 2026-03-01 | 2026-03-01 | | TD-011 | `cli/selftune/utils/seeded-random.ts` has no test file — violates golden-principles testing rule | Testing | Medium | — | Open | 2026-03-01 | 2026-03-01 | -| TD-012 | Dashboard server test (`tests/dashboard/dashboard-server.test.ts`) is flaky — `GET /api/events` sends initial data event fails intermittently with `null` response | Testing | Medium | — | Open | 2026-03-03 | 2026-03-03 | +| TD-012 | Dashboard server test (`tests/dashboard/dashboard-server.test.ts`) was flaky around legacy SSE `/api/events` behavior | Testing | Medium | — | Closed | 2026-03-03 | 2026-03-14 | ## Priority Definitions diff --git a/package.json b/package.json index 949820c..2a47a74 100644 --- a/package.json +++ b/package.json @@ -41,7 +41,6 @@ "bin/", "cli/selftune/", "apps/local-dashboard/dist/", - "dashboard/", "packages/telemetry-contract/", "templates/", ".claude/agents/", @@ -51,7 +50,7 @@ ], "scripts": { "dev": "sh -c 'if lsof -iTCP:7888 -sTCP:LISTEN >/dev/null 2>&1; then echo \"Using existing dashboard server on 7888\"; cd apps/local-dashboard && bun install && bunx vite --strictPort; else cd apps/local-dashboard && bun install && bun run dev; fi'", - "dev:dashboard": "bun run dev", + "dev:dashboard": "bun run cli/selftune/index.ts dashboard --port 7888 --no-open", "lint": "bunx @biomejs/biome check .", "lint:fix": "bunx @biomejs/biome check --write .", "lint:arch": "bun run lint-architecture.ts", diff --git a/skill/SKILL.md b/skill/SKILL.md index ac76605..18e6ae9 100644 --- a/skill/SKILL.md +++ b/skill/SKILL.md @@ -32,7 +32,7 @@ selftune [options] ``` Most commands output deterministic JSON. Parse JSON output for machine-readable commands. -`selftune dashboard` is an exception: it generates an HTML artifact and may print +`selftune dashboard` is an exception: it starts a local SPA server and may print informational progress lines. ## Quick Reference @@ -46,7 +46,7 @@ selftune watch --skill --skill-path [--auto-rollback] selftune status selftune last selftune doctor -selftune dashboard [--export] [--out FILE] [--serve] +selftune dashboard [--port ] [--no-open] selftune ingest-codex selftune ingest-opencode selftune ingest-openclaw [--agents-dir PATH] [--since DATE] [--dry-run] [--force] [--verbose] @@ -57,7 +57,7 @@ selftune contribute [--skill NAME] [--preview] [--sanitize LEVEL] [--submit] selftune cron setup [--dry-run] [--tz ] selftune cron list selftune cron remove [--dry-run] -selftune dashboard --serve [--port ] +selftune dashboard [--port ] [--no-open] selftune evolve-body --skill --skill-path --target [--dry-run] selftune baseline --skill --skill-path [--eval-set ] [--agent ] selftune badge --skill [--format svg|markdown|url] [--output ] diff --git a/skill/Workflows/Dashboard.md b/skill/Workflows/Dashboard.md index 2b86070..8ecd252 100644 --- a/skill/Workflows/Dashboard.md +++ b/skill/Workflows/Dashboard.md @@ -1,9 +1,7 @@ # selftune Dashboard Workflow -Visual dashboard for selftune telemetry, skill performance, evolution -audit, and monitoring data. The default dashboard is a React SPA backed -by SQLite materialized queries (v2 API). Also supports static HTML -export, file output, and a legacy HTML dashboard. +Open and operate the local selftune dashboard. The supported dashboard is the +React SPA backed by SQLite materialized queries. ## Default Command @@ -11,40 +9,15 @@ export, file output, and a legacy HTML dashboard. selftune dashboard ``` -Starts the dashboard server and opens the React SPA in the browser. -The SPA polls SQLite-backed v2 API endpoints every 15 seconds. +Starts the dashboard server on `localhost:3141` and opens the SPA in your browser. ## Options | Flag | Description | Default | |------|-------------|---------| -| `--export` | Export data-embedded HTML to stdout (legacy) | Off | -| `--out FILE` | Write data-embedded HTML to FILE (legacy) | None | -| `--serve` | Start live dashboard server (implied by default) | Off | -| `--port ` | Custom port for the server | 3141 | - -## Modes - -### Live Server (Default) - -Starts a Bun HTTP server. The React SPA serves at `/` and polls the -v2 API endpoints backed by SQLite. Data auto-refreshes every 15 seconds. - -```bash -selftune dashboard -selftune dashboard --port 8080 -``` - -### Legacy Static - -Builds an HTML file with all telemetry data embedded as JSON, saves it -to `~/.selftune/dashboard.html`, and opens it in the default browser. -The legacy dashboard is still accessible at `/legacy/` on the live server. - -```bash -selftune dashboard --export > dashboard.html -selftune dashboard --out /tmp/report.html -``` +| `--port ` | Custom port for the dashboard server | `3141` | +| `--no-open` | Start the server without opening a browser window | Off | +| `--serve` | Deprecated alias for the default behavior | Off | ## Server Architecture @@ -54,176 +27,55 @@ selftune dashboard --out /tmp/report.html JSONL logs → materializeIncremental() → SQLite (~/.selftune/selftune.db) → getOverviewPayload() / getSkillReportPayload() → /api/v2/* endpoints - → React SPA (polling every 15s) + → React SPA ``` -### Default Port - -The server binds to `localhost:3141` by default. Use `--port` to override. - ### Endpoints | Method | Path | Description | |--------|------|-------------| -| `GET` | `/` | Serve React SPA (production build) | -| `GET` | `/legacy/` | Serve legacy HTML dashboard | -| `GET` | `/api/v2/overview` | Combined overview payload + skill list (SQLite) | -| `GET` | `/api/v2/skills/:name` | Per-skill report payload (SQLite) | -| `GET` | `/api/data` | Legacy JSON endpoint (v1, JSONL-based) | -| `GET` | `/api/events` | Legacy SSE stream (v1) | +| `GET` | `/` | Serve React SPA | +| `GET` | `/api/v2/overview` | Overview payload + skill list | +| `GET` | `/api/v2/skills/:name` | Per-skill report payload | | `GET` | `/badge/:name` | Skill health badge SVG | -| `GET` | `/report/:name` | Per-skill HTML report | +| `GET` | `/report/:name` | Server-rendered per-skill HTML report | | `POST` | `/api/actions/watch` | Trigger `selftune watch` for a skill | | `POST` | `/api/actions/evolve` | Trigger `selftune evolve` for a skill | | `POST` | `/api/actions/rollback` | Trigger `selftune rollback` for a skill | -### Action Endpoints - -Action buttons in the dashboard trigger selftune commands via POST -requests. Each endpoint spawns a `bun run` subprocess. - -**Watch and Evolve** request body: - -```json -{ - "skill": "skill-name", - "skillPath": "/path/to/SKILL.md" -} -``` - -**Rollback** request body: - -```json -{ - "skill": "skill-name", - "skillPath": "/path/to/SKILL.md", - "proposalId": "proposal-uuid" -} -``` - -All action endpoints return: - -```json -{ - "success": true, - "output": "command stdout", - "error": null -} -``` - -On failure, `success` is `false` and `error` contains the error message. - -### Browser and Shutdown - -The live server auto-opens the dashboard URL in the default browser on -macOS (`open`) and Linux (`xdg-open`). - -Graceful shutdown on `SIGINT` (Ctrl+C) and `SIGTERM`: closes all SSE -client connections and stops the server. - -## Data Contents - -The SPA dashboard displays data materialized into SQLite from these sources: - -| Data | Source | SQLite Table | Description | -|------|--------|-------------|-------------| -| Telemetry | `session_telemetry_log.jsonl` | `sessions` | Session-level telemetry records | -| Skills | `skill_usage_log.jsonl` | `skill_usages` | Skill activation and usage events | -| Queries | `all_queries_log.jsonl` | `queries` | All user queries across sessions | -| Evolution | `evolution_audit_log.jsonl` | `evolution_entries` | Evolution audit trail (create, deploy, rollback) | -| Evidence | Computed from evals | `evidence_entries` | Per-skill evaluation evidence | -| Snapshots | Computed | `eval_snapshots` | Per-skill monitoring snapshots (pass rate, check count) | -| Unmatched | Computed | Via query | Queries that did not trigger any skill | -| Pending | Computed | Via query | Evolution proposals not yet deployed, rejected, or rolled back | - -If no log data is found, the static modes exit with an error message -listing the checked file paths. - -## Steps - -### 1. Choose Mode - -| Goal | Command | -|------|---------| -| Interactive dashboard | `selftune dashboard` | -| Interactive on custom port | `selftune dashboard --port 8080` | -| Save legacy report to file | `selftune dashboard --out report.html` | -| Pipe legacy report | `selftune dashboard --export` | - -### 2. Run Command - -```bash -# Start server and open React SPA (default) -selftune dashboard - -# Custom port -selftune dashboard --port 8080 -``` - -### 3. Interact with Dashboard - -- **Overview page** (`/`): KPI cards with info tooltips (total skills, - sessions, pass rate, unmatched queries, pending proposals, evidence), - skill health grid with status filters, evolution feed, unmatched queries. - First-time users see an onboarding banner with a 3-step setup guide; - returning users see a dismissible welcome banner. -- **Skill report** (`/skills/:name`): Per-skill drilldown with 8 KPI cards - (each with info tooltip), tabbed content (Evidence, Invocations, Prompts, - Sessions, Pending — each tab has a hover description), evolution timeline - sidebar with collapsible lifecycle legend, evidence viewer with context - banner explaining the evidence trail -- **Sidebar**: Collapsible navigation listing all skills by health status -- **Theme**: Dark/light toggle with selftune branding -- **Tooltips**: Hover over the info icon next to any metric label to see - what it measures. Hover over tab names for brief descriptions. - ## Common Patterns **"Show me the dashboard"** -> Run `selftune dashboard`. Opens the React SPA in your browser. +> Run `selftune dashboard`. -**"I want to drill into a specific skill"** -> Click any skill in the sidebar or skill health grid. The skill report -> page shows usage stats, evidence viewer, evolution timeline, and -> pending proposals. +**"Use a different port"** +> Run `selftune dashboard --port 8080`. -**"Export a report"** -> Use `selftune dashboard --out report.html` to save a self-contained -> legacy HTML file. Share it -- no server needed, all data is embedded. +**"Start the dashboard without launching a browser"** +> Run `selftune dashboard --no-open`. -**"The dashboard shows no data"** -> No log files found. Run some sessions first so hooks generate -> telemetry. Check `selftune doctor` to verify hooks are installed. +**"The dashboard won’t load"** +> Ensure the SPA build exists with `bun run build:dashboard` in the repo, then retry. +> If using the published package, verify the install completed correctly and run `selftune doctor`. -**"Use a different port"** -> `selftune dashboard --port 8080`. Port must be 1-65535. - -**"Trigger actions from the dashboard"** -> The dashboard provides buttons to trigger watch, evolve, and rollback -> for each skill. These call the action endpoints which spawn selftune -> subprocesses. +**"I want a per-skill deep link"** +> Open `/skills/` in the SPA, or `/report/` for the HTML report view. ## SPA Development -To develop the React SPA locally: - ```bash # From repo root bun run dev -# → if 7888 is free, starts both the dashboard server and the SPA dev server -# → if 7888 is already in use, reuses that dashboard server and starts only the SPA dev server on http://localhost:5199 -# Or run manually: -# Terminal 1: Start the dashboard server -selftune dashboard --port 7888 +# Server only +bun run dev:dashboard -# Terminal 2: Start the Vite dev server (proxies /api to port 7888) +# Or manually: +selftune dashboard --port 7888 --no-open cd apps/local-dashboard bun install bunx vite -# → opens at http://localhost:5199 ``` -Production builds are created with `bun run build:dashboard` from the -repo root and output to `apps/local-dashboard/dist/`. The dashboard -server serves these static files at `/`. +The Vite dev server runs at `http://localhost:5199` and proxies API traffic to +the dashboard server on `http://localhost:7888`. diff --git a/tests/dashboard/dashboard-server.test.ts b/tests/dashboard/dashboard-server.test.ts index 5f09344..b345f99 100644 --- a/tests/dashboard/dashboard-server.test.ts +++ b/tests/dashboard/dashboard-server.test.ts @@ -1,44 +1,93 @@ import { afterAll, beforeAll, describe, expect, it } from "bun:test"; +import type { + OverviewResponse, + SkillReportResponse, +} from "../../cli/selftune/dashboard-contract.js"; -/** - * Dashboard server tests — validates HTTP endpoints, SSE streaming, - * action handlers, and server lifecycle. - * - * Strategy: spawn actual server on port 0 (random), test with fetch, clean up. - */ - -// Dynamic import to avoid module-level failures when file doesn't exist yet let startDashboardServer: typeof import("../../cli/selftune/dashboard-server.js").startDashboardServer; -const fakeData = { - telemetry: [{ timestamp: "2026-03-12T10:00:00Z", session_id: "sess-1" }], + +const overviewFixture: OverviewResponse = { + overview: { + telemetry: [ + { + timestamp: "2026-03-12T10:00:00Z", + session_id: "sess-1", + skills_triggered: ["test-skill"], + errors_encountered: 0, + total_tool_calls: 3, + }, + ], + skills: [ + { + timestamp: "2026-03-12T10:00:00Z", + session_id: "sess-1", + skill_name: "test-skill", + skill_path: "/tmp/test-skill/SKILL.md", + query: "test prompt", + triggered: true, + source: "claude_code_repair", + }, + ], + evolution: [], + counts: { + telemetry: 1, + skills: 1, + evolution: 0, + evidence: 1, + sessions: 1, + prompts: 1, + }, + unmatched_queries: [], + pending_proposals: [], + }, skills: [ + { + skill_name: "test-skill", + skill_scope: "global", + total_checks: 1, + triggered_count: 1, + pass_rate: 1, + unique_sessions: 1, + last_seen: "2026-03-12T10:00:00Z", + has_evidence: true, + }, + ], + version: "0.2.1-test", +}; + +const skillReportFixture: SkillReportResponse = { + skill_name: "test-skill", + usage: { + total_checks: 1, + triggered_count: 1, + pass_rate: 1, + }, + recent_invocations: [ { timestamp: "2026-03-12T10:00:00Z", session_id: "sess-1", - skill_name: "test-skill", - skill_path: "/tmp/test-skill/SKILL.md", query: "test prompt", triggered: true, + source: "claude_code_repair", }, ], - queries: [{ timestamp: "2026-03-12T10:00:00Z", session_id: "sess-1", query: "test prompt" }], - evolution: [], evidence: [], - decisions: [], - computed: { - snapshots: { - "test-skill": { - window_sessions: 1, - pass_rate: 1, - false_negative_rate: 0, - regression_detected: false, - baseline_pass_rate: 0.5, - skill_checks: 1, - }, - }, - unmatched: [], - pendingProposals: [], + sessions_with_skill: 1, + evolution: [], + pending_proposals: [], + token_usage: { + total_input_tokens: 10, + total_output_tokens: 20, }, + canonical_invocations: [], + duration_stats: { + avg_duration_ms: 50, + total_duration_ms: 50, + execution_count: 1, + total_errors: 0, + }, + prompt_samples: [], + session_metadata: [], }; beforeAll(async () => { @@ -47,17 +96,16 @@ beforeAll(async () => { }); describe("dashboard-server", () => { - let serverPromise: - | Promise<{ server: unknown; stop: () => void; port: number }> - | null = null; + let serverPromise: Promise<{ server: unknown; stop: () => void; port: number }> | null = null; async function getServer(): Promise<{ server: unknown; stop: () => void; port: number }> { if (!serverPromise) { serverPromise = startDashboardServer({ - port: 0, // random port + port: 0, host: "127.0.0.1", openBrowser: false, - dataLoader: () => fakeData, + overviewLoader: () => overviewFixture, + skillReportLoader: (skillName) => (skillName === "test-skill" ? skillReportFixture : null), statusLoader: () => ({ skills: [ { @@ -79,6 +127,7 @@ describe("dashboard-server", () => { warn: 0, }, }), + evidenceLoader: () => [], actionRunner: async (command) => ({ success: command !== "rollback", output: `${command} ok`, @@ -96,11 +145,6 @@ describe("dashboard-server", () => { return res.text(); } - async function servesSpaShell(): Promise { - const html = await readRootHtml(); - return html.includes("
") && html.includes("/assets/"); - } - afterAll(async () => { if (serverPromise) { const server = await serverPromise; @@ -108,134 +152,82 @@ describe("dashboard-server", () => { } }); - // ---- GET / ---- describe("GET /", () => { it("returns 200 with HTML content", async () => { const server = await getServer(); const res = await fetch(`http://127.0.0.1:${server.port}/`); expect(res.status).toBe(200); expect(res.headers.get("content-type")).toContain("text/html"); - }, 15000); - - it("contains the selftune title", async () => { - const html = await readRootHtml(); - expect(html).toContain("selftune"); }); - it("serves either the SPA shell or the legacy live shell", async () => { + it("serves the SPA shell", async () => { const html = await readRootHtml(); - const isSpa = await servesSpaShell(); - if (isSpa) { - expect(html).toContain("
"); - expect(html).toContain("/assets/"); - } else { - expect(html).toContain("__SELFTUNE_LIVE__"); - } - }); - - it("keeps the legacy dashboard available at /legacy/ when SPA is active", async () => { - if (!(await servesSpaShell())) return; - - const server = await getServer(); - const res = await fetch(`http://127.0.0.1:${server.port}/legacy/`); - expect(res.status).toBe(200); - const html = await res.text(); - expect(html).toContain("__SELFTUNE_LIVE__"); + expect(html).toContain('
'); + expect(html).toContain("/assets/"); }); }); - // ---- GET /api/data ---- - describe("GET /api/data", () => { + describe("GET /api/v2/overview", () => { it("returns 200 with JSON", async () => { const server = await getServer(); - const res = await fetch(`http://127.0.0.1:${server.port}/api/data`); + const res = await fetch(`http://127.0.0.1:${server.port}/api/v2/overview`); expect(res.status).toBe(200); expect(res.headers.get("content-type")).toContain("application/json"); }); - it("returns expected data shape", async () => { + it("returns the overview payload contract", async () => { const server = await getServer(); - const res = await fetch(`http://127.0.0.1:${server.port}/api/data`); + const res = await fetch(`http://127.0.0.1:${server.port}/api/v2/overview`); const data = await res.json(); - expect(data).toHaveProperty("telemetry"); + expect(data).toHaveProperty("overview"); expect(data).toHaveProperty("skills"); - expect(data).toHaveProperty("queries"); - expect(data).toHaveProperty("evolution"); - expect(data).toHaveProperty("evidence"); - expect(data).toHaveProperty("computed"); - expect(Array.isArray(data.telemetry)).toBe(true); + expect(data).toHaveProperty("version"); + expect(Array.isArray(data.overview.telemetry)).toBe(true); expect(Array.isArray(data.skills)).toBe(true); - expect(Array.isArray(data.queries)).toBe(true); - expect(Array.isArray(data.evolution)).toBe(true); - expect(Array.isArray(data.evidence)).toBe(true); + expect(data.skills[0]?.skill_name).toBe("test-skill"); }); - it("includes decisions in the data", async () => { + it("includes CORS headers", async () => { const server = await getServer(); - const res = await fetch(`http://127.0.0.1:${server.port}/api/data`); - const data = await res.json(); - expect(data).toHaveProperty("decisions"); - expect(Array.isArray(data.decisions)).toBe(true); + const res = await fetch(`http://127.0.0.1:${server.port}/api/v2/overview`); + expect(res.headers.get("access-control-allow-origin")).toBe("*"); }); }); - // ---- GET /api/events (SSE) ---- - describe("GET /api/events", () => { - it("returns SSE content type", async () => { + describe("GET /api/v2/skills/:name", () => { + it("returns 200 with JSON", async () => { const server = await getServer(); - const controller = new AbortController(); - const res = await fetch(`http://127.0.0.1:${server.port}/api/events`, { - signal: controller.signal, - }); + const res = await fetch( + `http://127.0.0.1:${server.port}/api/v2/skills/${encodeURIComponent("test-skill")}`, + ); expect(res.status).toBe(200); - expect(res.headers.get("content-type")).toContain("text/event-stream"); - controller.abort(); + expect(res.headers.get("content-type")).toContain("application/json"); }); - it("sends initial data event", async () => { + it("returns the skill report payload contract", async () => { const server = await getServer(); - const controller = new AbortController(); - const timeout = setTimeout(() => controller.abort(), 3000); - - const res = await fetch(`http://127.0.0.1:${server.port}/api/events`, { - signal: controller.signal, - }); - - const reader = res.body?.getReader(); - expect(reader).toBeDefined(); - if (!reader) throw new Error("Response body reader is null"); - const decoder = new TextDecoder(); - let accumulated = ""; - - try { - while (true) { - const { done, value } = await reader.read(); - if (done) break; - accumulated += decoder.decode(value, { stream: true }); - // Wait for a complete SSE event (double newline terminates an event) - if (accumulated.includes("\n\n")) break; - } - } catch { - // abort expected - } finally { - clearTimeout(timeout); - controller.abort(); - } + const res = await fetch( + `http://127.0.0.1:${server.port}/api/v2/skills/${encodeURIComponent("test-skill")}`, + ); + const data = await res.json(); + expect(data.skill_name).toBe("test-skill"); + expect(data.usage.pass_rate).toBe(1); + expect(Array.isArray(data.recent_invocations)).toBe(true); + expect(Array.isArray(data.evolution)).toBe(true); + expect(Array.isArray(data.pending_proposals)).toBe(true); + }); - expect(accumulated).toContain("event: data"); - // The data line should be parseable JSON - const dataMatch = accumulated.match(/data: (.+)/); - expect(dataMatch).not.toBeNull(); - if (dataMatch) { - const parsed = JSON.parse(dataMatch[1]); - expect(parsed).toHaveProperty("telemetry"); - } + it("returns 404 for an unknown skill", async () => { + const server = await getServer(); + const res = await fetch( + `http://127.0.0.1:${server.port}/api/v2/skills/${encodeURIComponent("missing")}`, + ); + expect(res.status).toBe(404); }); }); - // ---- POST /api/actions/watch ---- - describe("POST /api/actions/watch", () => { - it("returns JSON response", async () => { + describe("POST /api/actions/*", () => { + it("watch returns JSON response", async () => { const server = await getServer(); const res = await fetch(`http://127.0.0.1:${server.port}/api/actions/watch`, { method: "POST", @@ -244,17 +236,10 @@ describe("dashboard-server", () => { }); expect(res.status).toBe(200); const data = await res.json(); - expect(data).toHaveProperty("success"); - // May fail since skill doesn't exist, but shape should be correct - expect(typeof data.success).toBe("boolean"); - expect(data).toHaveProperty("output"); - expect(data).toHaveProperty("error"); + expect(data.success).toBe(true); }); - }); - // ---- POST /api/actions/evolve ---- - describe("POST /api/actions/evolve", () => { - it("returns JSON response", async () => { + it("evolve returns JSON response", async () => { const server = await getServer(); const res = await fetch(`http://127.0.0.1:${server.port}/api/actions/evolve`, { method: "POST", @@ -263,14 +248,10 @@ describe("dashboard-server", () => { }); expect(res.status).toBe(200); const data = await res.json(); - expect(data).toHaveProperty("success"); - expect(typeof data.success).toBe("boolean"); + expect(data.success).toBe(true); }); - }); - // ---- POST /api/actions/rollback ---- - describe("POST /api/actions/rollback", () => { - it("returns JSON response with proposalId validation", async () => { + it("rollback validates proposalId", async () => { const server = await getServer(); const res = await fetch(`http://127.0.0.1:${server.port}/api/actions/rollback`, { method: "POST", @@ -278,157 +259,76 @@ describe("dashboard-server", () => { body: JSON.stringify({ skill: "test-skill", skillPath: "/tmp/test-skill", - proposalId: "test-proposal-123", + proposalId: "proposal-123", }), }); expect(res.status).toBe(200); const data = await res.json(); - expect(data).toHaveProperty("success"); - expect(typeof data.success).toBe("boolean"); - }); - }); - - // ---- GET /api/evaluations/:skillName ---- - describe("GET /api/evaluations/:skillName", () => { - it("returns 200 with JSON array", async () => { - const server = await getServer(); - const res = await fetch( - `http://127.0.0.1:${server.port}/api/evaluations/${encodeURIComponent("test-skill")}`, - ); - expect(res.status).toBe(200); - expect(res.headers.get("content-type")).toContain("application/json"); - const data = await res.json(); - expect(Array.isArray(data)).toBe(true); - }); - - it("returns entries with expected shape when data exists", async () => { - const server = await getServer(); - const res = await fetch( - `http://127.0.0.1:${server.port}/api/evaluations/${encodeURIComponent("test-skill")}`, - ); - const data = await res.json(); - // May be empty if no skill_usage_log.jsonl entries match, but shape is still an array - expect(Array.isArray(data)).toBe(true); - if (data.length > 0) { - expect(data[0]).toHaveProperty("timestamp"); - expect(data[0]).toHaveProperty("session_id"); - expect(data[0]).toHaveProperty("query"); - expect(data[0]).toHaveProperty("skill_name"); - expect(data[0]).toHaveProperty("triggered"); - } - }); - - it("returns empty array for unknown skill", async () => { - const server = await getServer(); - const res = await fetch( - `http://127.0.0.1:${server.port}/api/evaluations/${encodeURIComponent("nonexistent-skill-xyz")}`, - ); - expect(res.status).toBe(200); - const data = await res.json(); - expect(data).toEqual([]); - }); - - it("includes CORS headers", async () => { - const server = await getServer(); - const res = await fetch( - `http://127.0.0.1:${server.port}/api/evaluations/${encodeURIComponent("test-skill")}`, - ); - expect(res.headers.get("access-control-allow-origin")).toBe("*"); + expect(data.success).toBe(false); }); }); - // ---- 404 for unknown routes ---- describe("unknown routes", () => { - it("returns SPA fallback or 404 depending on served mode", async () => { + it("returns SPA fallback for client-side routes", async () => { const server = await getServer(); - const res = await fetch(`http://127.0.0.1:${server.port}/nonexistent`); - if (await servesSpaShell()) { - expect(res.status).toBe(200); - const html = await res.text(); - expect(html).toContain("
"); - } else { - expect(res.status).toBe(404); - } - }); - }); - - // ---- CORS headers ---- - describe("CORS", () => { - it("includes CORS headers on API responses", async () => { - const server = await getServer(); - const res = await fetch(`http://127.0.0.1:${server.port}/api/data`); - expect(res.headers.get("access-control-allow-origin")).toBe("*"); + const res = await fetch(`http://127.0.0.1:${server.port}/skills/test-skill`); + expect(res.status).toBe(200); + const html = await res.text(); + expect(html).toContain('
'); }); }); }); -// ---- Server lifecycle ---- describe("server lifecycle", () => { + const statusLoader = () => ({ + skills: [], + unmatchedQueries: 0, + pendingProposals: 0, + lastSession: null, + system: { healthy: true, pass: 0, fail: 0, warn: 0 }, + }); + it("can start and stop cleanly", async () => { const s = await startDashboardServer({ port: 0, host: "127.0.0.1", openBrowser: false, - dataLoader: () => fakeData, - statusLoader: () => ({ - skills: [], - unmatchedQueries: 0, - pendingProposals: 0, - lastSession: null, - system: { healthy: true, pass: 0, fail: 0, warn: 0 }, - }), + overviewLoader: () => overviewFixture, + skillReportLoader: () => null, + statusLoader, }); - expect(s).toHaveProperty("stop"); - expect(s).toHaveProperty("port"); expect(typeof s.port).toBe("number"); expect(s.port).toBeGreaterThan(0); s.stop(); - }, 30000); + }); - it("exposes port after binding", async () => { + it("exposes v2 overview after binding", async () => { const s = await startDashboardServer({ port: 0, host: "127.0.0.1", openBrowser: false, - dataLoader: () => fakeData, - statusLoader: () => ({ - skills: [], - unmatchedQueries: 0, - pendingProposals: 0, - lastSession: null, - system: { healthy: true, pass: 0, fail: 0, warn: 0 }, - }), + overviewLoader: () => overviewFixture, + skillReportLoader: () => null, + statusLoader, }); - // Verify the server is actually responding - const res = await fetch(`http://127.0.0.1:${s.port}/api/data`); + const res = await fetch(`http://127.0.0.1:${s.port}/api/v2/overview`); expect(res.status).toBe(200); s.stop(); - }, 15000); + }); }); -describe("live shell loading", () => { - it("serves / without eagerly loading dashboard data", async () => { - let dataLoaderCalls = 0; +describe("SPA shell loading", () => { + it("serves / without eagerly loading the overview payload", async () => { + let overviewLoaderCalls = 0; const server = await startDashboardServer({ port: 0, host: "127.0.0.1", openBrowser: false, - dataLoader: () => { - dataLoaderCalls++; - return { - telemetry: [], - skills: [], - queries: [], - evolution: [], - evidence: [], - decisions: [], - computed: { - snapshots: {}, - unmatched: [], - pendingProposals: [], - }, - }; + overviewLoader: () => { + overviewLoaderCalls++; + return overviewFixture; }, + skillReportLoader: () => skillReportFixture, statusLoader: () => ({ skills: [], unmatchedQueries: 0, @@ -443,53 +343,35 @@ describe("live shell loading", () => { }), }); - const callsBefore = dataLoaderCalls; try { const res = await fetch(`http://127.0.0.1:${server.port}/`); const html = await res.text(); expect(res.status).toBe(200); - const isSpa = html.includes("
") && html.includes("/assets/"); - if (isSpa) { - expect(html).toContain("
"); - } else { - expect(html).toContain("__SELFTUNE_LIVE__"); - expect(html).not.toContain('id="embedded-data"'); - } - expect(dataLoaderCalls).toBe(callsBefore); + expect(html).toContain('
'); + expect(overviewLoaderCalls).toBe(0); - const dataRes = await fetch(`http://127.0.0.1:${server.port}/api/data`); + const dataRes = await fetch(`http://127.0.0.1:${server.port}/api/v2/overview`); expect(dataRes.status).toBe(200); - expect(dataLoaderCalls).toBe(1); + expect(overviewLoaderCalls).toBe(1); } finally { server.stop(); } - }, 15000); + }); }); describe("report loading", () => { - it("loads report data without touching the full dashboard loader", async () => { - let dataLoaderCalls = 0; + it("loads report data without touching the v2 skill-report loader", async () => { + let skillReportLoaderCalls = 0; let evidenceLoaderCalls = 0; const server = await startDashboardServer({ port: 0, host: "127.0.0.1", openBrowser: false, - dataLoader: () => { - dataLoaderCalls++; - return { - telemetry: [], - skills: [], - queries: [], - evolution: [], - evidence: [], - decisions: [], - computed: { - snapshots: {}, - unmatched: [], - pendingProposals: [], - }, - }; + overviewLoader: () => overviewFixture, + skillReportLoader: () => { + skillReportLoaderCalls++; + return skillReportFixture; }, statusLoader: () => ({ skills: [ @@ -521,10 +403,10 @@ describe("report loading", () => { try { const res = await fetch(`http://127.0.0.1:${server.port}/report/test-skill`); expect(res.status).toBe(200); - expect(dataLoaderCalls).toBe(0); + expect(skillReportLoaderCalls).toBe(0); expect(evidenceLoaderCalls).toBe(1); } finally { server.stop(); } - }, 15000); + }); }); diff --git a/tests/dashboard/dashboard.test.ts b/tests/dashboard/dashboard.test.ts index 271f9a4..a643b07 100644 --- a/tests/dashboard/dashboard.test.ts +++ b/tests/dashboard/dashboard.test.ts @@ -2,113 +2,23 @@ import { describe, expect, it } from "bun:test"; import { existsSync, readFileSync } from "node:fs"; import { join } from "node:path"; -const DASHBOARD_PATH = join(import.meta.dir, "..", "..", "dashboard", "index.html"); - -describe("dashboard/index.html", () => { - it("exists", () => { - expect(existsSync(DASHBOARD_PATH)).toBe(true); - }); - - it("contains required elements", () => { - const html = readFileSync(DASHBOARD_PATH, "utf-8"); - expect(html).toContain("selftune"); - expect(html).toContain("dropZone"); - expect(html).toContain("session_telemetry_log.jsonl"); - expect(html).toContain("skill_usage_log.jsonl"); - expect(html).toContain("all_queries_log.jsonl"); - expect(html).toContain("evolution_audit_log.jsonl"); - expect(html).toContain("evolution_evidence_log.jsonl"); - }); - - it("loads Chart.js from CDN", () => { - const html = readFileSync(DASHBOARD_PATH, "utf-8"); - expect(html).toContain("chart.js"); - }); - - it("supports embedded data loading", () => { - const html = readFileSync(DASHBOARD_PATH, "utf-8"); - expect(html).toContain("embedded-data"); - expect(html).toContain("loadEmbeddedData"); - }); - - it("waits for DOM content before trying to load embedded data", () => { - const html = readFileSync(DASHBOARD_PATH, "utf-8"); - expect(html).toContain("window.addEventListener('DOMContentLoaded'"); - }); - - it("has skill health grid element", () => { - const html = readFileSync(DASHBOARD_PATH, "utf-8"); - expect(html).toContain("skill-health-grid"); - }); - - it("handles computed data field", () => { - const html = readFileSync(DASHBOARD_PATH, "utf-8"); - expect(html).toContain("computed"); - }); - - it("has drill-down panel element", () => { - const html = readFileSync(DASHBOARD_PATH, "utf-8"); - expect(html.includes("drill-down") || html.includes("drillDown")).toBe(true); - }); - - it("has skill search input", () => { - const html = readFileSync(DASHBOARD_PATH, "utf-8"); - expect(html).toContain("skillSearchInput"); - }); - - it("has evaluation feed table", () => { - const html = readFileSync(DASHBOARD_PATH, "utf-8"); - expect(html).toContain("drillEvalFeed"); - }); - - it("has evidence drill-down sections", () => { - const html = readFileSync(DASHBOARD_PATH, "utf-8"); - expect(html).toContain("drillVersionHistory"); - expect(html).toContain("drillEvidenceTable"); - }); - - it("has invocation breakdown chart", () => { - const html = readFileSync(DASHBOARD_PATH, "utf-8"); - expect(html).toContain("chartInvocationBreakdown"); - }); - - it("has time period selector buttons", () => { - const html = readFileSync(DASHBOARD_PATH, "utf-8"); - expect(html).toContain("period-btn"); - }); - - it("has 4-state badge classes", () => { - const html = readFileSync(DASHBOARD_PATH, "utf-8"); - expect(html).toContain("badge-warning"); - expect(html).toContain("badge-critical"); - expect(html).toContain("badge-healthy"); - expect(html).toContain("badge-unknown"); - }); -}); +const DASHBOARD_CLI_PATH = join(import.meta.dir, "..", "..", "cli", "selftune", "dashboard.ts"); describe("cli/selftune/dashboard.ts", () => { it("module exists", () => { - const modPath = join(import.meta.dir, "..", "..", "cli", "selftune", "dashboard.ts"); - expect(existsSync(modPath)).toBe(true); - }); - - it("imports from constants (shared layer)", () => { - const modPath = join(import.meta.dir, "..", "..", "cli", "selftune", "dashboard.ts"); - const src = readFileSync(modPath, "utf-8"); - expect(src).toContain("./constants"); + expect(existsSync(DASHBOARD_CLI_PATH)).toBe(true); }); - it("imports from monitoring for snapshot computation", () => { - const modPath = join(import.meta.dir, "..", "..", "cli", "selftune", "dashboard.ts"); - const src = readFileSync(modPath, "utf-8"); - expect(src).toContain("computeMonitoringSnapshot"); + it("documents the SPA server workflow", () => { + const src = readFileSync(DASHBOARD_CLI_PATH, "utf-8"); + expect(src).toContain("Start the local React SPA dashboard server"); + expect(src).toContain("--no-open"); + expect(src).not.toContain("buildEmbeddedHTML"); + expect(src).not.toContain("dashboard/index.html"); }); - it("imports from evolution for audit trail", () => { - const modPath = join(import.meta.dir, "..", "..", "cli", "selftune", "dashboard.ts"); - const src = readFileSync(modPath, "utf-8"); - expect(src).toContain("getLastDeployedProposal"); - expect(src).toContain("readAuditTrail"); - expect(src).toContain("readEvidenceTrail"); + it("rejects the removed legacy export mode explicitly", () => { + const src = readFileSync(DASHBOARD_CLI_PATH, "utf-8"); + expect(src).toContain("Legacy dashboard export was removed."); }); }); diff --git a/tests/sandbox/run-sandbox.ts b/tests/sandbox/run-sandbox.ts index 5702a08..6cb9311 100644 --- a/tests/sandbox/run-sandbox.ts +++ b/tests/sandbox/run-sandbox.ts @@ -189,6 +189,80 @@ async function runCliCommand(name: string, args: string[]): Promise { + const name = "dashboard"; + const command = `bun run ${CLI_PATH} dashboard --port ${port} --no-open`; + const start = performance.now(); + const baseUrl = `http://localhost:${port}`; + const proc = Bun.spawn( + ["bun", "run", CLI_PATH, "dashboard", "--port", String(port), "--no-open"], + { + env: sandboxEnv, + stdout: "pipe", + stderr: "pipe", + cwd: PROJECT_ROOT, + }, + ); + + let ready = false; + let failureReason = "Dashboard server did not become ready"; + + try { + for (let attempt = 0; attempt < 40; attempt++) { + await Bun.sleep(250); + try { + const rootRes = await fetch(`${baseUrl}/`); + if (rootRes.status !== 200) { + failureReason = `Expected 200 from dashboard root, got ${rootRes.status}`; + continue; + } + const html = await rootRes.text(); + if (!html.includes('
')) { + failureReason = "Expected SPA shell from dashboard root"; + continue; + } + + const overviewRes = await fetch(`${baseUrl}/api/v2/overview`); + if (overviewRes.status !== 200) { + failureReason = `Expected 200 from /api/v2/overview, got ${overviewRes.status}`; + continue; + } + const overview = await overviewRes.json(); + if (!overview?.overview || !Array.isArray(overview?.skills)) { + failureReason = "Expected overview payload from /api/v2/overview"; + continue; + } + + ready = true; + break; + } catch (error) { + failureReason = error instanceof Error ? error.message : String(error); + } + } + } finally { + proc.kill("SIGTERM"); + } + + const [stdout, stderr] = await Promise.all([ + new Response(proc.stdout).text(), + new Response(proc.stderr).text(), + ]); + const exitCode = await proc.exited; + const durationMs = Math.round(performance.now() - start); + + return { + name, + command, + exitCode, + passed: ready, + durationMs, + stdout: stdout.slice(0, 2000), + stderr: stderr.slice(0, 2000), + fullStdout: stdout, + error: ready ? undefined : failureReason, + }; +} + // --------------------------------------------------------------------------- // Hook runner // --------------------------------------------------------------------------- @@ -324,17 +398,8 @@ async function main(): Promise { const lastResult = await runCliCommand("last", ["last"]); results.push(lastResult); - // f. dashboard --export - const dashboardResult = await runCliCommand("dashboard --export", ["dashboard", "--export"]); - // Dashboard --export writes HTML to stdout; verify it contains HTML - if ( - dashboardResult.passed && - !dashboardResult.fullStdout.includes(" Date: Sat, 14 Mar 2026 17:07:36 +0300 Subject: [PATCH 05/14] Refresh execution plans after dashboard cutover --- docs/exec-plans/active/grader-prompt-evals.md | 10 +++- .../active/local-sqlite-materialization.md | 24 ++++++-- .../active/mcp-tool-descriptions.md | 10 +++- docs/exec-plans/active/multi-agent-sandbox.md | 8 ++- .../active/product-reset-and-shipping.md | 8 ++- docs/exec-plans/active/telemetry-field-map.md | 2 +- .../completed/dashboard-spa-cutover.md | 58 +++++++++++++++++++ 7 files changed, 106 insertions(+), 14 deletions(-) create mode 100644 docs/exec-plans/completed/dashboard-spa-cutover.md diff --git a/docs/exec-plans/active/grader-prompt-evals.md b/docs/exec-plans/active/grader-prompt-evals.md index 9abfdb6..aab5eb4 100644 --- a/docs/exec-plans/active/grader-prompt-evals.md +++ b/docs/exec-plans/active/grader-prompt-evals.md @@ -2,7 +2,7 @@ -**Status:** Active +**Status:** Deferred **Created:** 2026-03-14 **Goal:** Evaluate and improve the grader prompts and grading agents so selftune’s session/skill judgments are trustworthy, stable, and measurable. @@ -26,6 +26,14 @@ Current risks: - we do not yet have a tight eval loop for the graders themselves - users can lose trust quickly if the grader feels arbitrary +## Priority Note + +This remains important, but it is not the shortest path to the next release. It should resume once: + +- the local app/dashboard path is stable +- the orchestrated improvement loop is demoable end to end +- the published package proof is done + --- ## Goals diff --git a/docs/exec-plans/active/local-sqlite-materialization.md b/docs/exec-plans/active/local-sqlite-materialization.md index 7708063..c4d4535 100644 --- a/docs/exec-plans/active/local-sqlite-materialization.md +++ b/docs/exec-plans/active/local-sqlite-materialization.md @@ -1,6 +1,6 @@ # Execution Plan: Local SQLite Materialization and App Data Layer - + **Status:** Active **Created:** 2026-03-12 @@ -54,13 +54,19 @@ This is not a move to “database-first telemetry.” It is a local query/materi `#42` introduced the first SQLite local materialization layer. +Since then: + +- `#39` made the SPA the real local dashboard UI +- `#44` removed the legacy embedded-HTML runtime and v1 dashboard routes +- the shared dashboard payload contract now lives in `cli/selftune/dashboard-contract.ts` + That means the work now is not “decide whether to use SQLite.” The work now is: 1. stabilize the local DB schema and materialization flow 2. make overview/report queries first-class 3. move the local app to those queries -4. retire the old heavy dashboard path as the primary UX +4. finish migrating the remaining dashboard-adjacent surfaces onto the same v2 contracts --- @@ -126,9 +132,15 @@ The local data layer should explicitly support: The React local app should stop depending primarily on the old dashboard server’s heavy data path. -### 3. Keep the old dashboard path only as compatibility +### 3. Remove remaining non-v2 dashboard paths + +The legacy HTML runtime is gone. The remaining follow-through is to keep migrating: + +- report HTML +- badge/status projections +- any leftover JSONL-only dashboard helpers -Do not optimize it indefinitely. Keep it as fallback until the new path is trustworthy. +onto the same SQLite-backed payload semantics where appropriate. ### 4. Keep source-truth sync first @@ -152,11 +164,11 @@ Later: Short term: -- enough to support the new app and compatibility mode +- enough to serve the SPA, report HTML, badges, and action endpoints Long term: -- the new local app should be the default experience +- only the SPA/v2 contract, plus explicitly supported adjunct routes like badges and reports --- diff --git a/docs/exec-plans/active/mcp-tool-descriptions.md b/docs/exec-plans/active/mcp-tool-descriptions.md index 242dab5..ba71397 100644 --- a/docs/exec-plans/active/mcp-tool-descriptions.md +++ b/docs/exec-plans/active/mcp-tool-descriptions.md @@ -2,7 +2,7 @@ -**Status:** Active +**Status:** Deferred **Created:** 2026-03-14 **Goal:** Improve selftune’s MCP/tool descriptions so agent runtimes can understand and select the right tools more reliably, with less ambiguity and less prompt burden. @@ -25,6 +25,14 @@ This is especially important for: - Paperclip / Claude Code / other autonomous agent runtimes - future cloud/local parity in product semantics +## Priority Note + +This is intentionally not in the current release-critical path. It should stay deferred until: + +- the SPA/local app path is fully credible +- the autonomous loop is clearer +- the published install proof is complete + --- ## Goals diff --git a/docs/exec-plans/active/multi-agent-sandbox.md b/docs/exec-plans/active/multi-agent-sandbox.md index 5c9d8cd..c5004df 100644 --- a/docs/exec-plans/active/multi-agent-sandbox.md +++ b/docs/exec-plans/active/multi-agent-sandbox.md @@ -1,11 +1,15 @@ # Execution Plan: Multi-Agent Sandbox Expansion - + -**Status:** Active +**Status:** Deferred **Created:** 2026-03-02 **Goal:** Expand the sandbox test harness from Claude Code-only to cover all three agents (Claude Code, Codex, OpenCode) with shared fixtures, per-agent Layer 1 tests, and per-agent Layer 2 Docker containers. +## Priority Note + +This is no longer on the immediate shipping path. Keep the current Claude/sandbox coverage working, but defer the broader multi-agent expansion until after the next release candidate is shipped and validated. + --- ## Problem Statement diff --git a/docs/exec-plans/active/product-reset-and-shipping.md b/docs/exec-plans/active/product-reset-and-shipping.md index 00dd125..05d1fb3 100644 --- a/docs/exec-plans/active/product-reset-and-shipping.md +++ b/docs/exec-plans/active/product-reset-and-shipping.md @@ -1,6 +1,6 @@ # Execution Plan: Product Reset and Shipping Priorities - + **Status:** Active **Created:** 2026-03-12 @@ -15,10 +15,12 @@ selftune is no longer blocked by telemetry architecture. It is now blocked by ** Recent merged work changed the baseline: - `#38` hardened source-truth telemetry and repair paths +- `#39` merged the local dashboard SPA - `#40` added the first orchestrator core loop - `#41` made generic scheduling the primary posture and OpenClaw cron optional - `#42` added a local SQLite materialization layer - `#43` improved sync progress and tightened noisy query filtering +- `#44` removed the legacy dashboard runtime and made the SPA/server path authoritative That means the next phase should optimize for: @@ -140,9 +142,9 @@ Paperclip should accelerate iteration, not become the product priority. These are the highest-confidence gaps still blocking adoption and confident shipping: -### 1. The local UX is still not good enough +### 1. The local UX still needs product polish -The old dashboard path remains too slow and awkward, and the SQLite + SPA path is not yet the obvious default experience. +The SPA + SQLite path is now the supported default, but the experience still needs latency work, drilldown polish, and stronger route/report coherence before it feels fully ready for broader adoption. ### 2. The autonomous loop is not yet obvious and trustworthy diff --git a/docs/exec-plans/active/telemetry-field-map.md b/docs/exec-plans/active/telemetry-field-map.md index 68d1218..83b4ac2 100644 --- a/docs/exec-plans/active/telemetry-field-map.md +++ b/docs/exec-plans/active/telemetry-field-map.md @@ -2,7 +2,7 @@ -**Status:** Active +**Status:** Reference **Purpose:** Define the canonical telemetry contract that all platform adapters must emit before any downstream projection or analytics. **Audience:** Adapter implementers, reviewers, and anyone building the shared local/cloud telemetry pipeline. diff --git a/docs/exec-plans/completed/dashboard-spa-cutover.md b/docs/exec-plans/completed/dashboard-spa-cutover.md new file mode 100644 index 0000000..f65029a --- /dev/null +++ b/docs/exec-plans/completed/dashboard-spa-cutover.md @@ -0,0 +1,58 @@ +# Execution Plan: Dashboard SPA Cutover + + + +**Status:** Completed +**Completed:** 2026-03-14 +**Goal:** Retire the legacy embedded-HTML dashboard runtime and make the SPA + v2 dashboard server path the supported local experience. + +--- + +## What Landed + +- The React SPA became the supported local dashboard UI. +- `selftune dashboard` now starts the SPA-backed dashboard server directly. +- The legacy `dashboard/index.html` runtime was removed. +- Legacy v1 dashboard routes were removed from `cli/selftune/dashboard-server.ts`: + - `/legacy/` + - `/api/data` + - `/api/events` + - `/api/evaluations/:name` +- The shared dashboard payload contract was centralized in `cli/selftune/dashboard-contract.ts`. +- Dashboard docs and sandbox coverage were updated to the SPA/server model. + +## Resulting Product Shape + +The supported dashboard path is now: + +```text +selftune dashboard + -> dashboard server + -> /api/v2/overview + -> /api/v2/skills/:name + -> SPA at / +``` + +Supporting routes that still remain on the server: + +- `/badge/:name` +- `/report/:name` +- `/api/actions/*` + +## Follow-Through That Is Still Separate + +This cutover did not complete every dashboard-adjacent migration. Remaining follow-up belongs to other active plans: + +- move more report/badge/status semantics onto the same v2 data model +- continue improving SPA latency and UX polish +- finish the release/install proof against the published package + +## Verification + +The cutover was validated with: + +- focused dashboard server tests +- badge/report route tests +- sandbox dashboard HTTP smoke coverage + +The only remaining sandbox failure at completion time was the unrelated pre-existing `hook: skill-eval` issue. From bee281c1332b396fe97bbcebeed8ed0c0cc189a0 Mon Sep 17 00:00:00 2001 From: WellDunDun <45949032+WellDunDun@users.noreply.github.com> Date: Sat, 14 Mar 2026 17:28:45 +0300 Subject: [PATCH 06/14] Build dashboard SPA in CI and publish --- .github/workflows/ci.yml | 10 ++++++++++ .github/workflows/publish.yml | 3 +++ 2 files changed, 13 insertions(+) diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 5084e7b..4a7ed14 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -20,6 +20,16 @@ jobs: - run: bunx @biomejs/biome check . - run: bun run lint-architecture.ts + build-dashboard: + runs-on: ubuntu-latest + permissions: + contents: read + steps: + - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6 + - uses: oven-sh/setup-bun@ecf28ddc73e819eb6fa29df6b34ef8921c743461 # v2 + - run: bun install + - run: bun run build:dashboard + test: runs-on: ubuntu-latest permissions: diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml index 2d06499..f4f7139 100644 --- a/.github/workflows/publish.yml +++ b/.github/workflows/publish.yml @@ -60,6 +60,9 @@ jobs: - name: Install dependencies run: bun install + - name: Build dashboard SPA + run: bun run build:dashboard + - name: Verify npm version for trusted publishing run: npm --version From 23e53801ae749b83c689e23acf3b896daee46413 Mon Sep 17 00:00:00 2001 From: WellDunDun <45949032+WellDunDun@users.noreply.github.com> Date: Sat, 14 Mar 2026 17:33:27 +0300 Subject: [PATCH 07/14] Refresh README for SPA release path --- README.md | 67 ++++++++++++++++++++++++++++++------------------------- 1 file changed, 37 insertions(+), 30 deletions(-) diff --git a/README.md b/README.md index ab25e5c..0028a5e 100644 --- a/README.md +++ b/README.md @@ -15,7 +15,7 @@ [![Zero Dependencies](https://img.shields.io/badge/dependencies-0-brightgreen)](https://www.npmjs.com/package/selftune?activeTab=dependencies) [![Bun](https://img.shields.io/badge/runtime-bun%20%7C%20node-black)](https://bun.sh) -Your agent skills learn how you work. Detect what's broken. Fix it automatically. +Your agent skills learn how you work. Detect what's broken. Improve low-risk skill behavior automatically. **[Install](#install)** · **[Use Cases](#built-for-how-you-actually-work)** · **[How It Works](#how-it-works)** · **[Commands](#commands)** · **[Platforms](#platforms)** · **[Docs](docs/integration-guide.md)** @@ -23,7 +23,7 @@ Your agent skills learn how you work. Detect what's broken. Fix it automatically --- -Your skills don't understand how you talk. You say "make me a slide deck" and nothing happens — no error, no log, no signal. selftune watches your real sessions, learns how you actually speak, and rewrites skill descriptions to match. Automatically. +Your skills do not understand how you talk. You say "make me a slide deck" and nothing happens: no error, no signal, no clue why the right skill never fired. selftune reads the transcripts and telemetry your agent already saves, learns how you actually speak, and improves skill descriptions to match. It validates changes before deployment, watches for regressions after, and rolls back when needed. Built for **Claude Code**. Also works with Codex, OpenCode, and OpenClaw. Zero runtime dependencies. @@ -35,9 +35,18 @@ npx skills add selftune-dev/selftune Then tell your agent: **"initialize selftune"** -Two minutes. No API keys. No external services. No configuration ceremony. Uses your existing agent subscription. Within minutes you'll see which skills are undertriggering. +Two minutes. No API keys. No external services. No configuration ceremony. Uses your existing agent subscription. -**CLI only** (no skill, just the CLI): +Quick proof path: + +```bash +npx selftune@latest doctor +npx selftune@latest sync --force +npx selftune@latest status +npx selftune@latest dashboard +``` + +**CLI only** (no installed skill): ```bash npx selftune@latest doctor @@ -68,51 +77,49 @@ combinations repeat, which ones help, and where the friction is. Observe → Detect → Evolve → Watch

-A continuous feedback loop that makes your skills learn and adapt. Automatically. +A continuous feedback loop that makes your skills learn and adapt from real work. -**Observe** — Hooks capture every user query and which skills fired. On Claude Code, hooks install automatically. Use `selftune replay` to backfill existing transcripts. This is how your skills start learning. +**Observe** — selftune reads the transcripts and telemetry your agents already save. On Claude Code, hooks can add low-latency hints, but transcripts and logs are the source of truth. Use `selftune sync` to ingest current activity and `selftune replay` to backfill older Claude Code sessions. -**Detect** — selftune finds the gap between how you talk and how your skills are described. You say "make me a slide deck" and your pptx skill stays silent — selftune catches that mismatch. +**Detect** — selftune finds the gap between how you talk and how your skills are described. It spots missed triggers, underperforming descriptions, noisy environments, and regressions in real usage. -**Evolve** — Rewrites skill descriptions — and full skill bodies — to match how you actually work. Batched validation with per-stage model control (`--cheap-loop` uses haiku for the loop, sonnet for the gate). Teacher-student body evolution with 3-gate validation. Baseline comparison gates on measurable lift. Automatic backup. +**Evolve** — For low-risk changes, selftune can autonomously rewrite skill descriptions to match how you actually work. Every proposal is validated before deploy. Full skill-body or routing changes stay available for higher-touch workflows. -**Watch** — After deploying changes, selftune monitors skill trigger rates. If anything regresses, it rolls back automatically. Your skills keep improving without you touching them. +**Watch** — After deploying changes, selftune monitors trigger quality and post-deploy evidence. If something regresses, it can roll back automatically. The goal is autonomous improvement with safeguards, not blind self-editing. -## What's New in v0.2.0 +## What's New in v0.2.x -- **Full skill body evolution** — Beyond descriptions: evolve routing tables and entire skill bodies using teacher-student model with structural, trigger, and quality gates -- **Synthetic eval generation** — `selftune evals --synthetic` generates eval sets from SKILL.md via LLM, no session logs needed. Solves cold-start: new skills get evals immediately. -- **Cheap-loop evolution** — `selftune evolve --cheap-loop` uses haiku for proposal generation and validation, sonnet only for the final deployment gate. ~80% cost reduction. -- **Batch trigger validation** — Validation now batches 10 queries per LLM call instead of one-per-query. ~10x faster evolution loops. -- **Per-stage model control** — `--validation-model`, `--proposal-model`, and `--gate-model` flags give fine-grained control over which model runs each evolution stage. -- **Auto-activation system** — Hooks detect when selftune should run and suggest actions -- **Enforcement guardrails** — Blocks SKILL.md edits on monitored skills unless `selftune watch` has been run -- **React SPA dashboard** — `selftune dashboard` serves a React SPA with skill health grid, per-skill drilldown, evidence viewer, evolution timeline, dark/light theming, and SQLite-backed v2 API -- **Evolution memory** — Persists context, plans, and decisions across context resets -- **4 specialized agents** — Diagnosis analyst, pattern analyst, evolution reviewer, integration guide -- **Sandbox test harness** — Comprehensive automated test coverage, including devcontainer-based LLM testing -- **Workflow discovery + codification** — `selftune workflows` finds repeated - multi-skill sequences from telemetry, and `selftune workflows save - ` appends them to `## Workflows` in SKILL.md +- **Source-truth sync** — `selftune sync` now leads the product loop, using transcripts/logs as truth and hooks as hints +- **SQLite-backed local app** — `selftune dashboard` now serves the React SPA by default with faster overview/report routes on top of materialized local data +- **Autonomous low-risk evolution** — description evolution is autonomous by default, with explicit review-required mode for stricter policies +- **Full skill body evolution** — evolve routing tables and entire skill bodies using teacher-student model with structural, trigger, and quality gates +- **Synthetic eval generation** — `selftune evals --synthetic` generates eval sets from `SKILL.md` for cold-start skills +- **Cheap-loop evolution** — `selftune evolve --cheap-loop` uses haiku for proposal generation and validation, sonnet only for the final deployment gate +- **Per-stage model control** — `--validation-model`, `--proposal-model`, and `--gate-model` give fine-grained control over each evolution stage +- **Sandbox test harness** — automated coverage, including devcontainer-based LLM testing +- **Workflow discovery + codification** — `selftune workflows` finds repeated multi-skill sequences from telemetry and can append them to `## Workflows` in `SKILL.md` ## Commands | Command | What it does | |---|---| +| `selftune doctor` | Health check: logs, config, permissions, dashboard build/runtime expectations | +| `selftune sync` | Ingest source-truth activity from supported agents and rebuild local state | | `selftune status` | See which skills are undertriggering and why | +| `selftune dashboard` | Open the React SPA dashboard (SQLite-backed) | +| `selftune orchestrate` | Run the core loop: sync, inspect candidates, evolve, and watch | | `selftune evals --skill ` | Generate eval sets from real session data (`--synthetic` for cold-start) | | `selftune evolve --skill ` | Propose, validate, and deploy improved descriptions (`--cheap-loop`, `--with-baseline`) | | `selftune evolve-body --skill ` | Evolve full skill body or routing table (teacher-student, 3-gate validation) | +| `selftune watch --skill ` | Monitor after deploy. Auto-rollback on regression. | +| `selftune replay` | Backfill data from existing Claude Code transcripts | | `selftune baseline --skill ` | Measure skill value vs no-skill baseline | | `selftune unit-test --skill ` | Run or generate skill-level unit tests | | `selftune composability --skill ` | Measure synergy and conflicts between co-occurring skills, with workflow-candidate hints | | `selftune workflows` | Discover repeated multi-skill workflows and save a discovered workflow into `SKILL.md` | | `selftune import-skillsbench` | Import external eval corpus from [SkillsBench](https://github.com/benchflow-ai/skillsbench) | | `selftune badge --skill ` | Generate skill health badge SVG | -| `selftune watch --skill ` | Monitor after deploy. Auto-rollback on regression. | -| `selftune dashboard` | Open the React SPA dashboard (SQLite-backed) | -| `selftune replay` | Backfill data from existing Claude Code transcripts | -| `selftune doctor` | Health check: logs, hooks, config, permissions | +| `selftune cron setup` | Optional scheduler helper for OpenClaw-oriented automation | Full command reference: `selftune --help` @@ -141,13 +148,13 @@ Observability tools trace LLM calls. Skill authoring tools help you write skills ## Platforms -**Claude Code** (primary) — Hooks install automatically. `selftune replay` backfills existing transcripts. Full feature support. +**Claude Code** (primary) — Reads saved transcripts and telemetry directly. Hooks install automatically and add low-latency hints. `selftune replay` backfills older Claude Code sessions. Full feature support. **Codex** — `selftune wrap-codex -- ` or `selftune ingest-codex` **OpenCode** — `selftune ingest-opencode` -**OpenClaw** — `selftune ingest-openclaw` + `selftune cron setup` for autonomous evolution +**OpenClaw** — `selftune ingest-openclaw`. `selftune cron setup` remains available as an optional OpenClaw-oriented scheduler helper, but the main product loop is agent-agnostic. Requires [Bun](https://bun.sh) or Node.js 18+. No extra API keys. From 68f99579a26d39dcc7ed370814432bdecab53483 Mon Sep 17 00:00:00 2001 From: WellDunDun <45949032+WellDunDun@users.noreply.github.com> Date: Sat, 14 Mar 2026 17:39:58 +0300 Subject: [PATCH 08/14] Address dashboard release review comments --- cli/selftune/dashboard-server.ts | 49 ++++- docs/escalation-policy.md | 2 +- .../active/product-reset-and-shipping.md | 2 - .../active/telemetry-normalization.md | 2 +- .../telemetry-field-map.md | 2 +- package.json | 2 +- tests/dashboard/badge-routes.test.ts | 196 ++++++++++++------ tests/dashboard/dashboard-server.test.ts | 30 ++- 8 files changed, 207 insertions(+), 78 deletions(-) rename docs/exec-plans/{active => reference}/telemetry-field-map.md (99%) diff --git a/cli/selftune/dashboard-server.ts b/cli/selftune/dashboard-server.ts index fef2c4f..8ee671a 100644 --- a/cli/selftune/dashboard-server.ts +++ b/cli/selftune/dashboard-server.ts @@ -4,6 +4,7 @@ * * Endpoints: * GET / — Serve dashboard SPA shell + * GET /api/health — Dashboard server health probe * GET /api/v2/overview — SQLite-backed overview payload * GET /api/v2/skills/:name — SQLite-backed per-skill report * POST /api/actions/watch — Trigger `selftune watch` for a skill @@ -46,6 +47,7 @@ import { readEffectiveSkillUsageRecords } from "./utils/skill-log.js"; export interface DashboardServerOptions { port?: number; host?: string; + spaDir?: string; openBrowser?: boolean; statusLoader?: () => StatusResult; evidenceLoader?: () => EvolutionEvidenceEntry[]; @@ -75,6 +77,14 @@ function findSpaDir(): string | null { return null; } +function decodePathSegment(segment: string): string | null { + try { + return decodeURIComponent(segment); + } catch { + return null; + } +} + const MIME_TYPES: Record = { ".html": "text/html; charset=utf-8", ".js": "application/javascript; charset=utf-8", @@ -412,7 +422,7 @@ export async function startDashboardServer( const executeAction = options?.actionRunner ?? runAction; // -- SPA serving ------------------------------------------------------------- - const spaDir = findSpaDir(); + const spaDir = options?.spaDir ?? findSpaDir(); if (spaDir) { console.log(`SPA found at ${spaDir}, serving as default dashboard`); } else { @@ -499,6 +509,19 @@ export async function startDashboardServer( return new Response(null, { status: 204, headers: corsHeaders() }); } + if (url.pathname === "/api/health" && req.method === "GET") { + return Response.json( + { + ok: true, + service: "selftune-dashboard", + version: selftuneVersion, + spa: Boolean(spaDir), + v2_data_available: Boolean(getOverviewResponse || db), + }, + { headers: corsHeaders() }, + ); + } + // ---- SPA static assets ---- Serve from dist/assets/ if (spaDir && req.method === "GET" && url.pathname.startsWith("/assets/")) { const filePath = resolve(spaDir, `.${url.pathname}`); @@ -590,7 +613,13 @@ export async function startDashboardServer( // ---- GET /badge/:skillName ---- Badge SVG if (url.pathname.startsWith("/badge/") && req.method === "GET") { - const skillName = decodeURIComponent(url.pathname.slice("/badge/".length)); + const skillName = decodePathSegment(url.pathname.slice("/badge/".length)); + if (skillName === null) { + return Response.json( + { error: "Malformed skill name" }, + { status: 400, headers: corsHeaders() }, + ); + } const formatParam = url.searchParams.get("format"); const validFormats = new Set(["svg", "markdown", "url"]); const format: BadgeFormat = @@ -654,7 +683,13 @@ export async function startDashboardServer( // ---- GET /report/:skillName ---- Skill health report if (url.pathname.startsWith("/report/") && req.method === "GET") { - const skillName = decodeURIComponent(url.pathname.slice("/report/".length)); + const skillName = decodePathSegment(url.pathname.slice("/report/".length)); + if (skillName === null) { + return Response.json( + { error: "Malformed skill name" }, + { status: 400, headers: corsHeaders() }, + ); + } const statusResult = await getCachedStatusResult(); const skill = statusResult.skills.find((s) => s.name === skillName); const evidenceEntries = getEvidenceEntries().filter( @@ -700,7 +735,13 @@ export async function startDashboardServer( // ---- GET /api/v2/skills/:name ---- SQLite-backed skill report if (url.pathname.startsWith("/api/v2/skills/") && req.method === "GET") { - const skillName = decodeURIComponent(url.pathname.slice("/api/v2/skills/".length)); + const skillName = decodePathSegment(url.pathname.slice("/api/v2/skills/".length)); + if (skillName === null) { + return Response.json( + { error: "Malformed skill name" }, + { status: 400, headers: corsHeaders() }, + ); + } if (getSkillReportResponse) { const report = getSkillReportResponse(skillName); if (!report) { diff --git a/docs/escalation-policy.md b/docs/escalation-policy.md index 30930cc..25f4e2c 100644 --- a/docs/escalation-policy.md +++ b/docs/escalation-policy.md @@ -52,7 +52,7 @@ Clear criteria for when agents proceed autonomously vs. when to involve a human. - Modifying the SKILL.md routing table (affects which workflow agents load) - Changing `computeStatus` logic in `status.ts` (affects skill health reporting) - Changing `computeLastInsight` logic in `last.ts` (affects session insight accuracy) -- Modifying the dashboard response contract in `dashboard-contract.ts` +- Modifying the dashboard response contract in `cli/selftune/dashboard-contract.ts` - Changing SQLite-backed dashboard query shapes in `cli/selftune/localdb/queries.ts` - Modifying activation rules configuration - Changing agent assignment logic diff --git a/docs/exec-plans/active/product-reset-and-shipping.md b/docs/exec-plans/active/product-reset-and-shipping.md index 05d1fb3..1b54ce3 100644 --- a/docs/exec-plans/active/product-reset-and-shipping.md +++ b/docs/exec-plans/active/product-reset-and-shipping.md @@ -136,8 +136,6 @@ Paperclip should accelerate iteration, not become the product priority. --- -## Current Recommendations - ## Remaining Product Gaps These are the highest-confidence gaps still blocking adoption and confident shipping: diff --git a/docs/exec-plans/active/telemetry-normalization.md b/docs/exec-plans/active/telemetry-normalization.md index 1ae2d3a..61b9c56 100644 --- a/docs/exec-plans/active/telemetry-normalization.md +++ b/docs/exec-plans/active/telemetry-normalization.md @@ -201,7 +201,7 @@ That means the normalization layer should be grounded in verified platform contr ### Track 0 Verification Snapshot (2026-03-10) The implementation contract derived from this snapshot lives in -[`telemetry-field-map.md`](./telemetry-field-map.md). +[`telemetry-field-map.md`](../reference/telemetry-field-map.md). **Official source references** diff --git a/docs/exec-plans/active/telemetry-field-map.md b/docs/exec-plans/reference/telemetry-field-map.md similarity index 99% rename from docs/exec-plans/active/telemetry-field-map.md rename to docs/exec-plans/reference/telemetry-field-map.md index 83b4ac2..154a286 100644 --- a/docs/exec-plans/active/telemetry-field-map.md +++ b/docs/exec-plans/reference/telemetry-field-map.md @@ -1,6 +1,6 @@ # Telemetry Source-to-Canonical Field Map - + **Status:** Reference **Purpose:** Define the canonical telemetry contract that all platform adapters must emit before any downstream projection or analytics. diff --git a/package.json b/package.json index 2a47a74..792dff0 100644 --- a/package.json +++ b/package.json @@ -49,7 +49,7 @@ "CHANGELOG.md" ], "scripts": { - "dev": "sh -c 'if lsof -iTCP:7888 -sTCP:LISTEN >/dev/null 2>&1; then echo \"Using existing dashboard server on 7888\"; cd apps/local-dashboard && bun install && bunx vite --strictPort; else cd apps/local-dashboard && bun install && bun run dev; fi'", + "dev": "sh -c 'if lsof -iTCP:7888 -sTCP:LISTEN >/dev/null 2>&1; then if curl -fsS http://127.0.0.1:7888/api/health | grep -q selftune-dashboard; then echo \"Using existing dashboard server on 7888\"; cd apps/local-dashboard && bun install && bunx vite --strictPort; else echo \"Port 7888 is occupied by a non-selftune service\"; exit 1; fi; else cd apps/local-dashboard && bun install && bun run dev; fi'", "dev:dashboard": "bun run cli/selftune/index.ts dashboard --port 7888 --no-open", "lint": "bunx @biomejs/biome check .", "lint:fix": "bunx @biomejs/biome check --write .", diff --git a/tests/dashboard/badge-routes.test.ts b/tests/dashboard/badge-routes.test.ts index 4a45e58..e2e8306 100644 --- a/tests/dashboard/badge-routes.test.ts +++ b/tests/dashboard/badge-routes.test.ts @@ -1,10 +1,14 @@ import { afterAll, beforeAll, describe, expect, it } from "bun:test"; +import { mkdtempSync, mkdirSync, rmSync, writeFileSync } from "node:fs"; +import { tmpdir } from "node:os"; +import { join } from "node:path"; +import type { + OverviewResponse, + SkillReportResponse, +} from "../../cli/selftune/dashboard-contract.js"; import type { StatusResult } from "../../cli/selftune/status.js"; import type { - EvolutionAuditEntry, EvolutionEvidenceEntry, - QueryLogRecord, - SessionTelemetryRecord, SkillUsageRecord, } from "../../cli/selftune/types.js"; @@ -16,69 +20,116 @@ import type { */ let startDashboardServer: typeof import("../../cli/selftune/dashboard-server.js").startDashboardServer; +let testSpaDir: string; const reportSkillName = "test-skill"; -const dashboardFixture = { - telemetry: [] as SessionTelemetryRecord[], +const overviewFixture: OverviewResponse = { + overview: { + telemetry: [], + skills: [ + { + timestamp: "2026-03-10T10:00:00.000Z", + session_id: "sess-report-1", + skill_name: reportSkillName, + skill_path: "/tmp/test-skill/SKILL.md", + query: "Use the test skill", + triggered: true, + }, + ] as SkillUsageRecord[], + evolution: [], + counts: { + telemetry: 0, + skills: 1, + evolution: 0, + evidence: 1, + sessions: 1, + prompts: 1, + }, + unmatched_queries: [], + pending_proposals: [], + }, skills: [ { - timestamp: "2026-03-10T10:00:00.000Z", - session_id: "sess-report-1", skill_name: reportSkillName, - skill_path: "/tmp/test-skill/SKILL.md", - query: "Use the test skill", - triggered: true, + skill_scope: "global", + total_checks: 1, + triggered_count: 1, + pass_rate: 1, + unique_sessions: 1, + last_seen: "2026-03-10T10:00:00.000Z", + has_evidence: true, }, - ] as SkillUsageRecord[], - queries: [ + ], + version: "0.2.1-test", +}; +const evidenceFixture: EvolutionEvidenceEntry[] = [ + { + timestamp: "2026-03-10T10:00:00.000Z", + proposal_id: "proposal-test-skill-1", + skill_name: reportSkillName, + skill_path: "/tmp/test-skill/SKILL.md", + stage: "validated", + target: "description", + original_text: "Original description", + proposed_text: "Proposed description", + details: "Validation completed", + validation: { + before_pass_rate: 0.5, + after_pass_rate: 1, + improved: true, + regressions: [], + new_passes: [ + { + query: "Use the test skill", + should_trigger: true, + }, + ], + per_entry_results: [ + { + entry: { + query: "Use the test skill", + should_trigger: true, + }, + before_pass: false, + after_pass: true, + }, + ], + }, + }, +] as EvolutionEvidenceEntry[]; +const skillReportFixture: SkillReportResponse = { + skill_name: reportSkillName, + usage: { + total_checks: 1, + triggered_count: 1, + pass_rate: 1, + }, + recent_invocations: [ { timestamp: "2026-03-10T10:00:00.000Z", session_id: "sess-report-1", query: "Use the test skill", + triggered: true, + source: "claude_code_repair", }, - ] as QueryLogRecord[], - evolution: [] as EvolutionAuditEntry[], - evidence: [ - { - timestamp: "2026-03-10T10:00:00.000Z", - proposal_id: "proposal-test-skill-1", - skill_name: reportSkillName, - skill_path: "/tmp/test-skill/SKILL.md", - stage: "validated", - target: "description", - original_text: "Original description", - proposed_text: "Proposed description", - details: "Validation completed", - validation: { - before_pass_rate: 0.5, - after_pass_rate: 1, - improved: true, - regressions: [], - new_passes: [ - { - query: "Use the test skill", - should_trigger: true, - }, - ], - per_entry_results: [ - { - entry: { - query: "Use the test skill", - should_trigger: true, - }, - before_pass: false, - after_pass: true, - }, - ], - }, - }, - ] as EvolutionEvidenceEntry[], - decisions: [], - computed: { - snapshots: {}, - unmatched: [], - pendingProposals: [], + ], + evidence: [], + sessions_with_skill: 1, + evolution: [], + pending_proposals: [], + token_usage: { + total_input_tokens: 0, + total_output_tokens: 0, }, + canonical_invocations: [], + duration_stats: { + avg_duration_ms: 0, + total_duration_ms: 0, + execution_count: 0, + total_errors: 0, + }, + prompt_samples: [], + session_metadata: [], }; const statusFixture: StatusResult = { skills: [ @@ -105,6 +156,13 @@ const statusFixture: StatusResult = { beforeAll(async () => { const mod = await import("../../cli/selftune/dashboard-server.js"); startDashboardServer = mod.startDashboardServer; + testSpaDir = mkdtempSync(join(tmpdir(), "selftune-badge-test-")); + mkdirSync(join(testSpaDir, "assets"), { recursive: true }); + writeFileSync( + join(testSpaDir, "index.html"), + `
`, + ); + writeFileSync(join(testSpaDir, "assets", "app.js"), "console.log('selftune badge test spa');\n"); }); describe("badge routes", () => { @@ -113,11 +171,13 @@ describe("badge routes", () => { beforeAll(async () => { server = await startDashboardServer({ port: 0, - host: "localhost", + host: "127.0.0.1", + spaDir: testSpaDir, openBrowser: false, - dataLoader: () => dashboardFixture, + overviewLoader: () => overviewFixture, + skillReportLoader: (skillName) => (skillName === reportSkillName ? skillReportFixture : null), statusLoader: () => statusFixture, - evidenceLoader: () => dashboardFixture.evidence, + evidenceLoader: () => evidenceFixture, }); }); @@ -127,7 +187,7 @@ describe("badge routes", () => { describe("GET /badge/:skillName", () => { it("returns SVG content type for unknown skill", async () => { - const res = await fetch(`http://localhost:${server.port}/badge/nonexistent-skill`); + const res = await fetch(`http://127.0.0.1:${server.port}/badge/nonexistent-skill`); expect(res.status).toBe(404); expect(res.headers.get("content-type")).toContain("image/svg+xml"); const body = await res.text(); @@ -136,24 +196,24 @@ describe("badge routes", () => { }); it("returns valid SVG badge (not JSON error)", async () => { - const res = await fetch(`http://localhost:${server.port}/badge/nonexistent-skill`); + const res = await fetch(`http://127.0.0.1:${server.port}/badge/nonexistent-skill`); const body = await res.text(); // Should be valid SVG, not a JSON error expect(body.startsWith(" { - const res = await fetch(`http://localhost:${server.port}/badge/test-skill`); + const res = await fetch(`http://127.0.0.1:${server.port}/badge/test-skill`); expect(res.headers.get("cache-control")).toBe("no-cache, no-store"); }); it("includes CORS headers", async () => { - const res = await fetch(`http://localhost:${server.port}/badge/test-skill`); + const res = await fetch(`http://127.0.0.1:${server.port}/badge/test-skill`); expect(res.headers.get("access-control-allow-origin")).toBe("*"); }); it("returns text/plain for ?format=markdown", async () => { - const res = await fetch(`http://localhost:${server.port}/badge/nonexistent?format=markdown`); + const res = await fetch(`http://127.0.0.1:${server.port}/badge/nonexistent?format=markdown`); // For unknown skills, still returns SVG 404 (badge not found) // But for known skills would return text/plain expect(res.status).toBe(404); @@ -162,18 +222,18 @@ describe("badge routes", () => { describe("GET /report/:skillName", () => { it("returns 404 for unknown skill", async () => { - const res = await fetch(`http://localhost:${server.port}/report/nonexistent-skill`); + const res = await fetch(`http://127.0.0.1:${server.port}/report/nonexistent-skill`); expect(res.status).toBe(404); }); it("includes CORS headers", async () => { - const res = await fetch(`http://localhost:${server.port}/report/test-skill`); + const res = await fetch(`http://127.0.0.1:${server.port}/report/test-skill`); expect(res.headers.get("access-control-allow-origin")).toBe("*"); }); it("renders evidence sections for a real skill report", async () => { const res = await fetch( - `http://localhost:${server.port}/report/${encodeURIComponent(reportSkillName)}`, + `http://127.0.0.1:${server.port}/report/${encodeURIComponent(reportSkillName)}`, ); expect(res.status).toBe(200); const html = await res.text(); @@ -182,8 +242,12 @@ describe("badge routes", () => { }); it("returns text/plain for missing skill", async () => { - const res = await fetch(`http://localhost:${server.port}/report/nonexistent`); + const res = await fetch(`http://127.0.0.1:${server.port}/report/nonexistent`); expect(res.headers.get("content-type")).toContain("text/plain"); }); }); }); + +afterAll(() => { + rmSync(testSpaDir, { recursive: true, force: true }); +}); diff --git a/tests/dashboard/dashboard-server.test.ts b/tests/dashboard/dashboard-server.test.ts index b345f99..698394a 100644 --- a/tests/dashboard/dashboard-server.test.ts +++ b/tests/dashboard/dashboard-server.test.ts @@ -1,10 +1,14 @@ import { afterAll, beforeAll, describe, expect, it } from "bun:test"; +import { mkdtempSync, mkdirSync, rmSync, writeFileSync } from "node:fs"; +import { tmpdir } from "node:os"; +import { join } from "node:path"; import type { OverviewResponse, SkillReportResponse, } from "../../cli/selftune/dashboard-contract.js"; let startDashboardServer: typeof import("../../cli/selftune/dashboard-server.js").startDashboardServer; +let testSpaDir: string; const overviewFixture: OverviewResponse = { overview: { @@ -93,16 +97,24 @@ const skillReportFixture: SkillReportResponse = { beforeAll(async () => { const mod = await import("../../cli/selftune/dashboard-server.js"); startDashboardServer = mod.startDashboardServer; + testSpaDir = mkdtempSync(join(tmpdir(), "selftune-dashboard-test-")); + mkdirSync(join(testSpaDir, "assets"), { recursive: true }); + writeFileSync( + join(testSpaDir, "index.html"), + `
`, + ); + writeFileSync(join(testSpaDir, "assets", "app.js"), "console.log('selftune test spa');\n"); }); describe("dashboard-server", () => { - let serverPromise: Promise<{ server: unknown; stop: () => void; port: number }> | null = null; + let serverPromise: ReturnType | null = null; - async function getServer(): Promise<{ server: unknown; stop: () => void; port: number }> { + async function getServer(): Promise>> { if (!serverPromise) { serverPromise = startDashboardServer({ port: 0, host: "127.0.0.1", + spaDir: testSpaDir, openBrowser: false, overviewLoader: () => overviewFixture, skillReportLoader: (skillName) => (skillName === "test-skill" ? skillReportFixture : null), @@ -224,6 +236,12 @@ describe("dashboard-server", () => { ); expect(res.status).toBe(404); }); + + it("returns 400 for malformed skill-name encoding", async () => { + const server = await getServer(); + const res = await fetch(`http://127.0.0.1:${server.port}/api/v2/skills/%E0%A4%A`); + expect(res.status).toBe(400); + }); }); describe("POST /api/actions/*", () => { @@ -292,6 +310,7 @@ describe("server lifecycle", () => { const s = await startDashboardServer({ port: 0, host: "127.0.0.1", + spaDir: testSpaDir, openBrowser: false, overviewLoader: () => overviewFixture, skillReportLoader: () => null, @@ -306,6 +325,7 @@ describe("server lifecycle", () => { const s = await startDashboardServer({ port: 0, host: "127.0.0.1", + spaDir: testSpaDir, openBrowser: false, overviewLoader: () => overviewFixture, skillReportLoader: () => null, @@ -323,6 +343,7 @@ describe("SPA shell loading", () => { const server = await startDashboardServer({ port: 0, host: "127.0.0.1", + spaDir: testSpaDir, openBrowser: false, overviewLoader: () => { overviewLoaderCalls++; @@ -367,6 +388,7 @@ describe("report loading", () => { const server = await startDashboardServer({ port: 0, host: "127.0.0.1", + spaDir: testSpaDir, openBrowser: false, overviewLoader: () => overviewFixture, skillReportLoader: () => { @@ -410,3 +432,7 @@ describe("report loading", () => { } }); }); + +afterAll(() => { + rmSync(testSpaDir, { recursive: true, force: true }); +}); From 789ebef95c744bc3efe563e8294809f1cf50e89e Mon Sep 17 00:00:00 2001 From: WellDunDun <45949032+WellDunDun@users.noreply.github.com> Date: Sat, 14 Mar 2026 17:46:52 +0300 Subject: [PATCH 09/14] Fix biome lint errors in dashboard tests Co-Authored-By: Claude Opus 4.6 --- tests/dashboard/badge-routes.test.ts | 7 ++----- tests/dashboard/dashboard-server.test.ts | 2 +- 2 files changed, 3 insertions(+), 6 deletions(-) diff --git a/tests/dashboard/badge-routes.test.ts b/tests/dashboard/badge-routes.test.ts index e2e8306..3915117 100644 --- a/tests/dashboard/badge-routes.test.ts +++ b/tests/dashboard/badge-routes.test.ts @@ -1,5 +1,5 @@ import { afterAll, beforeAll, describe, expect, it } from "bun:test"; -import { mkdtempSync, mkdirSync, rmSync, writeFileSync } from "node:fs"; +import { mkdirSync, mkdtempSync, rmSync, writeFileSync } from "node:fs"; import { tmpdir } from "node:os"; import { join } from "node:path"; import type { @@ -7,10 +7,7 @@ import type { SkillReportResponse, } from "../../cli/selftune/dashboard-contract.js"; import type { StatusResult } from "../../cli/selftune/status.js"; -import type { - EvolutionEvidenceEntry, - SkillUsageRecord, -} from "../../cli/selftune/types.js"; +import type { EvolutionEvidenceEntry, SkillUsageRecord } from "../../cli/selftune/types.js"; /** * Badge route tests — validates /badge/:skillName and /report/:skillName diff --git a/tests/dashboard/dashboard-server.test.ts b/tests/dashboard/dashboard-server.test.ts index 698394a..ebc1fda 100644 --- a/tests/dashboard/dashboard-server.test.ts +++ b/tests/dashboard/dashboard-server.test.ts @@ -1,5 +1,5 @@ import { afterAll, beforeAll, describe, expect, it } from "bun:test"; -import { mkdtempSync, mkdirSync, rmSync, writeFileSync } from "node:fs"; +import { mkdirSync, mkdtempSync, rmSync, writeFileSync } from "node:fs"; import { tmpdir } from "node:os"; import { join } from "node:path"; import type { From 61c7deca68475ceb9a53657ad6d171e8b1e1b91c Mon Sep 17 00:00:00 2001 From: WellDunDun <45949032+WellDunDun@users.noreply.github.com> Date: Sat, 14 Mar 2026 17:58:28 +0300 Subject: [PATCH 10/14] Make autonomous loop the default scheduler path --- cli/selftune/cron/setup.ts | 15 +- cli/selftune/init.ts | 19 +++ cli/selftune/orchestrate.ts | 13 +- cli/selftune/schedule.ts | 245 +++++++++++++++++++++++++++----- docs/integration-guide.md | 49 +++---- skill/Workflows/Cron.md | 18 ++- skill/Workflows/Initialize.md | 7 +- skill/Workflows/Schedule.md | 22 +-- tests/cron/setup.test.ts | 7 +- tests/orchestrate.test.ts | 5 +- tests/schedule/schedule.test.ts | 65 ++++++--- 11 files changed, 340 insertions(+), 125 deletions(-) diff --git a/cli/selftune/cron/setup.ts b/cli/selftune/cron/setup.ts index ef9a4fb..0c91080 100644 --- a/cli/selftune/cron/setup.ts +++ b/cli/selftune/cron/setup.ts @@ -45,18 +45,11 @@ export const DEFAULT_CRON_JOBS: CronJobConfig[] = [ description: "Daily health check after source sync", }, { - name: "selftune-evolve", - cron: "0 3 * * 0", - message: - "Run selftune sync, review source-truth status, and run selftune evolve --sync-first for any skills with enough negative evidence or clear undertriggering patterns. Report proposed changes and validation results.", - description: "Weekly evolution at 3am Sunday", - }, - { - name: "selftune-watch", + name: "selftune-orchestrate", cron: "0 */6 * * *", message: - "Run selftune sync first, then run selftune watch --sync-first on all recently evolved skills to detect regressions against the latest source-truth telemetry.", - description: "Monitor regressions every 6 hours after source sync", + "Run selftune orchestrate --max-skills 3. This performs source-truth sync, selects candidate skills, evolves validated low-risk descriptions autonomously, and watches recent deployments for regressions.", + description: "Autonomous improvement loop every 6 hours", }, ]; @@ -123,7 +116,7 @@ export function loadCronJobs(jobsPath: string): CronJobConfig[] { /** Register default cron jobs with OpenClaw. */ export async function setupCronJobs(tz: string, dryRun: boolean): Promise { const openclawPath = Bun.which("openclaw"); - if (!openclawPath) { + if (!dryRun && !openclawPath) { console.error("Error: openclaw is not installed or not in PATH."); console.error(""); console.error("Install OpenClaw:"); diff --git a/cli/selftune/init.ts b/cli/selftune/init.ts index e4e4f07..a1b2735 100644 --- a/cli/selftune/init.ts +++ b/cli/selftune/init.ts @@ -8,6 +8,7 @@ * * Usage: * selftune init [--agent ] [--cli-path ] [--force] + * selftune init --enable-autonomy [--schedule-format cron|launchd|systemd] */ import { @@ -407,6 +408,8 @@ export async function cliMain(): Promise { agent: { type: "string" }, "cli-path": { type: "string" }, force: { type: "boolean", default: false }, + "enable-autonomy": { type: "boolean", default: false }, + "schedule-format": { type: "string" }, }, strict: true, }); @@ -466,6 +469,22 @@ export async function cliMain(): Promise { total: doctorResult.summary.total, }), ); + + if (values["enable-autonomy"]) { + const { installSchedule } = await import("./schedule.js"); + const scheduleResult = installSchedule({ + format: values["schedule-format"], + }); + console.log( + JSON.stringify({ + level: "info", + code: "autonomy_enabled", + format: scheduleResult.format, + activated: scheduleResult.activated, + files: scheduleResult.artifacts.map((artifact) => artifact.path), + }), + ); + } } // Guard: only run when invoked directly diff --git a/cli/selftune/orchestrate.ts b/cli/selftune/orchestrate.ts index 092156d..131d10a 100644 --- a/cli/selftune/orchestrate.ts +++ b/cli/selftune/orchestrate.ts @@ -79,6 +79,14 @@ export interface OrchestrateResult { /** Candidate selection criteria. */ const CANDIDATE_STATUSES = new Set(["CRITICAL", "WARNING", "UNGRADED"]); +function candidatePriority(skill: SkillStatus): number { + const statusWeight = + skill.status === "CRITICAL" ? 300 : skill.status === "WARNING" ? 200 : 100; + const missedWeight = Math.min(skill.missedQueries, 50); + const passPenalty = skill.passRate === null ? 0 : Math.round((1 - skill.passRate) * 100); + return statusWeight + missedWeight + passPenalty; +} + /** * Injectable dependencies for orchestrate(). Pass overrides in tests. */ @@ -126,8 +134,9 @@ export function selectCandidates( options: Pick, ): SkillAction[] { const actions: SkillAction[] = []; + const orderedSkills = [...skills].sort((a, b) => candidatePriority(b) - candidatePriority(a)); - for (const skill of skills) { + for (const skill of orderedSkills) { // Apply skill filter if (options.skillFilter && skill.name !== options.skillFilter) { actions.push({ @@ -370,7 +379,7 @@ export async function orchestrate( skillPath, windowSessions: 20, regressionThreshold: 0.1, - autoRollback: false, + autoRollback: true, syncFirst: false, }); diff --git a/cli/selftune/schedule.ts b/cli/selftune/schedule.ts index 74d401e..820958e 100644 --- a/cli/selftune/schedule.ts +++ b/cli/selftune/schedule.ts @@ -8,9 +8,13 @@ * For OpenClaw-specific scheduling, see `selftune cron`. * * Usage: - * selftune schedule [--format cron|launchd|systemd] + * selftune schedule [--format cron|launchd|systemd] [--install] [--dry-run] */ +import { spawnSync } from "node:child_process"; +import { mkdirSync, writeFileSync } from "node:fs"; +import { homedir } from "node:os"; +import { dirname, join } from "node:path"; import { parseArgs } from "node:util"; import { DEFAULT_CRON_JOBS } from "./cron/setup.js"; @@ -33,10 +37,8 @@ function commandForJob(jobName: string): string { return "selftune sync"; case "selftune-status": return "selftune sync && selftune status"; - case "selftune-evolve": - return "selftune evolve --sync-first --skill --skill-path "; - case "selftune-watch": - return "selftune watch --sync-first --skill --skill-path "; + case "selftune-orchestrate": + return "selftune orchestrate --max-skills 3"; default: return `selftune ${jobName.replace("selftune-", "")}`; } @@ -49,6 +51,19 @@ export const SCHEDULE_ENTRIES: ScheduleEntry[] = DEFAULT_CRON_JOBS.map((job) => description: job.description, })); +export interface ScheduleInstallArtifact { + path: string; + content: string; +} + +export interface ScheduleInstallResult { + format: ScheduleFormat; + artifacts: ScheduleInstallArtifact[]; + activationCommands: string[]; + activated: boolean; + dryRun: boolean; +} + // --------------------------------------------------------------------------- // Helpers for launchd/systemd generation // --------------------------------------------------------------------------- @@ -91,7 +106,6 @@ function cronToLaunchdSchedule(cron: string): string { function cronToOnCalendar(cron: string): string { if (cron === "*/30 * * * *") return "*:0/30"; if (cron === "0 8 * * *") return "*-*-* 08:00:00"; - if (cron === "0 3 * * 0") return "Sun *-*-* 03:00:00"; if (cron === "0 */6 * * *") return "*-*-* 0/6:00:00"; return cron; } @@ -123,8 +137,9 @@ export function generateCrontab(): string { const lines = [ "# selftune automation — add to your crontab with: crontab -e", "#", - "# The core loop: sync → status → evolve → watch", - "# Adjust paths and skill names for your setup.", + "# The core loop: sync → orchestrate", + "# status remains a reporting job; orchestrate handles sync, candidate", + "# selection, low-risk description evolution, and watch/rollback follow-up.", "#", ]; for (const entry of SCHEDULE_ENTRIES) { @@ -135,15 +150,14 @@ export function generateCrontab(): string { return lines.join("\n"); } -export function generateLaunchd(): string { - const plists: string[] = []; - - for (const entry of SCHEDULE_ENTRIES) { - const label = `com.selftune.${entry.name.replace("selftune-", "")}`; - const args = toLaunchdArgs(entry.command); - const schedule = cronToLaunchdSchedule(entry.schedule); +function buildLaunchdDefinition(entry: ScheduleEntry): { label: string; content: string } { + const label = `com.selftune.${entry.name.replace("selftune-", "")}`; + const args = toLaunchdArgs(entry.command); + const schedule = cronToLaunchdSchedule(entry.schedule); - plists.push(` + return { + label, + content: ` Detect --> Diagnose --> Propose --> Validate --> Deploy --> Watch +--------------------------------------------------------------------+ ``` -1. **Observe** -- Hooks capture every session (queries, triggers, metrics) -2. **Detect** -- `evals` finds missed triggers across invocation types +1. **Observe** -- source-truth transcripts and telemetry are replayed into the shared logs +2. **Detect** -- `sync`, `status`, and `evals` surface missed triggers and weak routing 3. **Diagnose** -- `grade` evaluates session quality with evidence -4. **Propose** -- `evolve` generates description improvements -5. **Validate** -- Evolution is tested against the eval set -6. **Deploy** -- Updated description replaces the original (with backup) -7. **Watch** -- `watch` monitors for regressions post-deploy +4. **Propose** -- `evolve` generates low-risk description improvements +5. **Validate** -- proposals are checked before deploy +6. **Deploy** -- validated descriptions can ship autonomously +7. **Watch** -- `watch` monitors recent changes and rolls back regressions ## Resource Index @@ -170,6 +173,7 @@ Observe --> Detect --> Diagnose --> Propose --> Validate --> Deploy --> Watch | `Workflows/Grade.md` | Grade a session with expectations and evidence | | `Workflows/Evals.md` | Generate eval sets, list skills, show stats | | `Workflows/Evolve.md` | Evolve a skill description from failure patterns | +| `Workflows/Orchestrate.md` | Run the autonomy-first sync → evolve → watch loop | | `Workflows/Rollback.md` | Undo an evolution, restore previous description | | `Workflows/Watch.md` | Post-deploy regression monitoring | | `Workflows/Doctor.md` | Health checks on logs, hooks, schema | @@ -177,9 +181,10 @@ Observe --> Detect --> Diagnose --> Propose --> Validate --> Deploy --> Watch | `Workflows/Replay.md` | Backfill logs from Claude Code transcripts | | `Workflows/Sync.md` | Source-truth sync across supported agents + repaired overlay rebuild | | `Workflows/Contribute.md` | Export anonymized data for community contribution | +| `Workflows/Schedule.md` | Install platform-native scheduling for the autonomous loop | | `Workflows/Cron.md` | Manage OpenClaw cron jobs for autonomous evolution | | `Workflows/AutoActivation.md` | Auto-activation hook behavior and rules | -| `Workflows/Dashboard.md` | Dashboard modes: static, export, live server | +| `Workflows/Dashboard.md` | Run the SPA dashboard and per-skill report views | | `Workflows/EvolutionMemory.md` | Evolution memory system for session continuity | | `Workflows/EvolveBody.md` | Full body and routing table evolution | | `Workflows/Baseline.md` | No-skill baseline comparison and lift measurement | @@ -203,6 +208,7 @@ them. - "What skills are undertriggering?" - "Generate evals for the pptx skill" - "Evolve the pptx skill to catch more queries" +- "Run the autonomous selftune loop" - "Rollback the last evolution" - "Is the skill performing well after the change?" - "Check selftune health" @@ -221,8 +227,8 @@ them. - "Rebuild the repaired skill overlay" - "Contribute my selftune data to the community" - "Share anonymized skill data" -- "Set up cron jobs for autonomous evolution" -- "Schedule selftune to run automatically" +- "Install autonomous scheduling for this machine" +- "Set up OpenClaw cron jobs for selftune" - "Ingest my OpenClaw sessions" - "Why is selftune suggesting things?" - "Customize activation rules" diff --git a/skill/Workflows/Orchestrate.md b/skill/Workflows/Orchestrate.md new file mode 100644 index 0000000..8e66dbd --- /dev/null +++ b/skill/Workflows/Orchestrate.md @@ -0,0 +1,70 @@ +# selftune Orchestrate Workflow + +Run the autonomy-first selftune loop in one command. + +`selftune orchestrate` is the primary closed-loop entrypoint. It runs +source-truth sync, computes current skill health, selects candidates, +deploys validated low-risk description changes autonomously, and watches +recent changes with auto-rollback enabled. + +## When to Use + +- You want the full autonomous loop, not isolated subcommands +- You want to improve skills without manually chaining `sync`, `status`, `evolve`, and `watch` +- You want a dry-run of what selftune would change next +- You want a stricter review policy for a single run + +## Default Command + +```bash +selftune orchestrate +``` + +## Flags + +| Flag | Description | Default | +|------|-------------|---------| +| `--dry-run` | Plan and validate without deploying changes | Off | +| `--review-required` | Keep validated changes in review mode instead of deploying | Off | +| `--skill ` | Limit the loop to one skill | All skills | +| `--max-skills ` | Cap how many candidates are processed in one run | `3` | +| `--recent-window ` | Window for post-deploy watch/rollback checks | `24` | +| `--sync-force` | Force a full source replay before candidate selection | Off | + +## Default Behavior + +- Sync source-truth telemetry first +- Prioritize critical/warning/ungraded skills with real missed-query signal +- Deploy validated low-risk description changes automatically +- Watch recent deployments and roll back regressions automatically + +Use `--review-required` only when you want a stricter policy for a specific run. + +## Common Patterns + +**"Run the full loop now"** +> Run `selftune orchestrate`. + +**"Show me what would change first"** +> Run `selftune orchestrate --dry-run`. + +**"Only work on one skill"** +> Run `selftune orchestrate --skill selftune`. + +**"Keep review in the loop for this run"** +> Run `selftune orchestrate --review-required`. + +**"Force a full replay before acting"** +> Run `selftune orchestrate --sync-force`. + +## Output + +The command prints: + +- sync results +- candidate-selection reasoning +- evolve/watch actions taken +- skipped skills and why +- a final summary with counts and elapsed time + +This is the recommended runtime for recurring autonomous scheduling. diff --git a/skill/references/setup-patterns.md b/skill/references/setup-patterns.md index 8f26fc3..05757c3 100644 --- a/skill/references/setup-patterns.md +++ b/skill/references/setup-patterns.md @@ -45,7 +45,7 @@ the repo root so hook paths and telemetry cover the whole workspace. - Run `selftune init --agent openclaw` - Use `selftune ingest-openclaw` for ingestion - Use `selftune doctor` to verify the shared logs are healthy -- Use `selftune cron setup` if the user wants autonomous recurring runs +- Use `selftune cron setup` if the user specifically wants OpenClaw-managed recurring runs ## Mixed-Agent Setup @@ -54,6 +54,7 @@ combined. - Initialize each platform against the same `~/.selftune/` data directory - Ingest platform-specific logs into the shared JSONL schema +- Use `selftune schedule --install` for the default autonomous scheduler path - Use `selftune status`, `selftune dashboard`, and `selftune workflows` on the merged dataset From 4ea63c3d188ae55366a8cf6f5b988dfd3ebe40d2 Mon Sep 17 00:00:00 2001 From: WellDunDun <45949032+WellDunDun@users.noreply.github.com> Date: Sat, 14 Mar 2026 19:30:41 +0300 Subject: [PATCH 12/14] Document autonomy-first setup path --- README.md | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 0028a5e..4b5d022 100644 --- a/README.md +++ b/README.md @@ -46,6 +46,14 @@ npx selftune@latest status npx selftune@latest dashboard ``` +Autonomy quick start: + +```bash +npx selftune@latest init --enable-autonomy +npx selftune@latest orchestrate --dry-run +npx selftune@latest schedule --install --dry-run +``` + **CLI only** (no installed skill): ```bash @@ -92,6 +100,7 @@ A continuous feedback loop that makes your skills learn and adapt from real work - **Source-truth sync** — `selftune sync` now leads the product loop, using transcripts/logs as truth and hooks as hints - **SQLite-backed local app** — `selftune dashboard` now serves the React SPA by default with faster overview/report routes on top of materialized local data - **Autonomous low-risk evolution** — description evolution is autonomous by default, with explicit review-required mode for stricter policies +- **Autonomous scheduling** — `selftune init --enable-autonomy` and `selftune schedule --install` make the orchestrated loop the default recurring runtime - **Full skill body evolution** — evolve routing tables and entire skill bodies using teacher-student model with structural, trigger, and quality gates - **Synthetic eval generation** — `selftune evals --synthetic` generates eval sets from `SKILL.md` for cold-start skills - **Cheap-loop evolution** — `selftune evolve --cheap-loop` uses haiku for proposal generation and validation, sonnet only for the final deployment gate @@ -108,6 +117,7 @@ A continuous feedback loop that makes your skills learn and adapt from real work | `selftune status` | See which skills are undertriggering and why | | `selftune dashboard` | Open the React SPA dashboard (SQLite-backed) | | `selftune orchestrate` | Run the core loop: sync, inspect candidates, evolve, and watch | +| `selftune schedule --install` | Install platform-native scheduling for the autonomous loop | | `selftune evals --skill ` | Generate eval sets from real session data (`--synthetic` for cold-start) | | `selftune evolve --skill ` | Propose, validate, and deploy improved descriptions (`--cheap-loop`, `--with-baseline`) | | `selftune evolve-body --skill ` | Evolve full skill body or routing table (teacher-student, 3-gate validation) | @@ -154,7 +164,7 @@ Observability tools trace LLM calls. Skill authoring tools help you write skills **OpenCode** — `selftune ingest-opencode` -**OpenClaw** — `selftune ingest-openclaw`. `selftune cron setup` remains available as an optional OpenClaw-oriented scheduler helper, but the main product loop is agent-agnostic. +**OpenClaw** — `selftune ingest-openclaw`. `selftune cron setup` remains available as an optional OpenClaw-oriented scheduler helper, but the main product loop is still `selftune orchestrate` plus generic scheduling. Requires [Bun](https://bun.sh) or Node.js 18+. No extra API keys. From f0ed1527fd4f0b2f9be498bbc60355102697730b Mon Sep 17 00:00:00 2001 From: WellDunDun <45949032+WellDunDun@users.noreply.github.com> Date: Sat, 14 Mar 2026 19:38:41 +0300 Subject: [PATCH 13/14] Harden autonomous scheduler install paths --- cli/selftune/dashboard-server.ts | 7 +- cli/selftune/init.ts | 46 ++++--- cli/selftune/orchestrate.ts | 3 +- cli/selftune/schedule.ts | 165 ++++++++++++++++++----- docs/integration-guide.md | 2 +- skill/Workflows/Schedule.md | 4 +- tests/dashboard/dashboard-server.test.ts | 34 +++++ tests/schedule/schedule.test.ts | 44 +++++- 8 files changed, 250 insertions(+), 55 deletions(-) diff --git a/cli/selftune/dashboard-server.ts b/cli/selftune/dashboard-server.ts index 8ee671a..d47671c 100644 --- a/cli/selftune/dashboard-server.ts +++ b/cli/selftune/dashboard-server.ts @@ -422,10 +422,15 @@ export async function startDashboardServer( const executeAction = options?.actionRunner ?? runAction; // -- SPA serving ------------------------------------------------------------- - const spaDir = options?.spaDir ?? findSpaDir(); + const requestedSpaDir = options?.spaDir ?? findSpaDir(); + const spaDir = + requestedSpaDir && existsSync(join(requestedSpaDir, "index.html")) ? requestedSpaDir : null; if (spaDir) { console.log(`SPA found at ${spaDir}, serving as default dashboard`); } else { + if (options?.spaDir) { + console.warn(`Configured spaDir is missing index.html: ${options.spaDir}`); + } console.warn( "SPA build not found. Run `bun run build:dashboard` before using `selftune dashboard`.", ); diff --git a/cli/selftune/init.ts b/cli/selftune/init.ts index a1b2735..ee1d5be 100644 --- a/cli/selftune/init.ts +++ b/cli/selftune/init.ts @@ -417,9 +417,10 @@ export async function cliMain(): Promise { const configDir = SELFTUNE_CONFIG_DIR; const configPath = SELFTUNE_CONFIG_PATH; const force = values.force ?? false; + const enableAutonomy = values["enable-autonomy"] ?? false; // Check for existing config without force - if (!force && existsSync(configPath)) { + if (!force && !enableAutonomy && existsSync(configPath)) { try { const raw = readFileSync(configPath, "utf-8"); const existing = JSON.parse(raw) as SelftuneConfig; @@ -470,20 +471,35 @@ export async function cliMain(): Promise { }), ); - if (values["enable-autonomy"]) { - const { installSchedule } = await import("./schedule.js"); - const scheduleResult = installSchedule({ - format: values["schedule-format"], - }); - console.log( - JSON.stringify({ - level: "info", - code: "autonomy_enabled", - format: scheduleResult.format, - activated: scheduleResult.activated, - files: scheduleResult.artifacts.map((artifact) => artifact.path), - }), - ); + if (enableAutonomy) { + try { + const { installSchedule } = await import("./schedule.js"); + const scheduleResult = installSchedule({ + format: values["schedule-format"], + }); + + if (!scheduleResult.activated) { + console.error( + "Failed to activate the autonomous scheduler. Re-run with --schedule-format or use `selftune schedule --install --dry-run` to inspect the generated artifacts first.", + ); + process.exit(1); + } + + console.log( + JSON.stringify({ + level: "info", + code: "autonomy_enabled", + format: scheduleResult.format, + activated: scheduleResult.activated, + files: scheduleResult.artifacts.map((artifact) => artifact.path), + }), + ); + } catch (err) { + console.error( + `Failed to enable autonomy: ${err instanceof Error ? err.message : String(err)}`, + ); + process.exit(1); + } } } diff --git a/cli/selftune/orchestrate.ts b/cli/selftune/orchestrate.ts index 131d10a..cd8d69b 100644 --- a/cli/selftune/orchestrate.ts +++ b/cli/selftune/orchestrate.ts @@ -80,8 +80,7 @@ export interface OrchestrateResult { const CANDIDATE_STATUSES = new Set(["CRITICAL", "WARNING", "UNGRADED"]); function candidatePriority(skill: SkillStatus): number { - const statusWeight = - skill.status === "CRITICAL" ? 300 : skill.status === "WARNING" ? 200 : 100; + const statusWeight = skill.status === "CRITICAL" ? 300 : skill.status === "WARNING" ? 200 : 100; const missedWeight = Math.min(skill.missedQueries, 50); const passPenalty = skill.passRate === null ? 0 : Math.round((1 - skill.passRate) * 100); return statusWeight + missedWeight + passPenalty; diff --git a/cli/selftune/schedule.ts b/cli/selftune/schedule.ts index 820958e..49052c6 100644 --- a/cli/selftune/schedule.ts +++ b/cli/selftune/schedule.ts @@ -12,7 +12,7 @@ */ import { spawnSync } from "node:child_process"; -import { mkdirSync, writeFileSync } from "node:fs"; +import { mkdirSync, readFileSync, writeFileSync } from "node:fs"; import { homedir } from "node:os"; import { dirname, join } from "node:path"; import { parseArgs } from "node:util"; @@ -64,6 +64,9 @@ export interface ScheduleInstallResult { dryRun: boolean; } +const CRON_BEGIN_MARKER = "# BEGIN SELFTUNE"; +const CRON_END_MARKER = "# END SELFTUNE"; + // --------------------------------------------------------------------------- // Helpers for launchd/systemd generation // --------------------------------------------------------------------------- @@ -150,6 +153,30 @@ export function generateCrontab(): string { return lines.join("\n"); } +function escapeRegex(value: string): string { + return value.replace(/[.*+?^${}()|[\]\\]/g, "\\$&"); +} + +export function wrapManagedCrontabBlock(content: string): string { + return `${CRON_BEGIN_MARKER}\n${content.trim()}\n${CRON_END_MARKER}\n`; +} + +export function mergeManagedCrontab(existing: string, managedContent: string): string { + const managedBlock = wrapManagedCrontabBlock(managedContent); + const normalizedExisting = existing.replace(/\r\n/g, "\n"); + const markerPattern = new RegExp( + `${escapeRegex(CRON_BEGIN_MARKER)}[\\s\\S]*?${escapeRegex(CRON_END_MARKER)}\\n?`, + "g", + ); + const withoutExistingBlock = normalizedExisting.replace(markerPattern, "").trimEnd(); + + if (!withoutExistingBlock) { + return managedBlock; + } + + return `${withoutExistingBlock}\n\n${managedBlock}`; +} + function buildLaunchdDefinition(entry: ScheduleEntry): { label: string; content: string } { const label = `com.selftune.${entry.name.replace("selftune-", "")}`; const args = toLaunchdArgs(entry.command); @@ -195,9 +222,11 @@ export function generateLaunchd(): string { return plists.join("\n\n"); } -function buildSystemdDefinition( - entry: ScheduleEntry, -): { baseName: string; timerContent: string; serviceContent: string } { +function buildSystemdDefinition(entry: ScheduleEntry): { + baseName: string; + timerContent: string; + serviceContent: string; +} { const unitName = entry.name; const calendar = cronToOnCalendar(entry.schedule); const execStart = toSystemdExecStart(entry.command); @@ -272,7 +301,7 @@ export function buildInstallPlan( const path = join(homeDir, ".selftune", "schedule", "selftune.crontab"); return { artifacts: [{ path, content: generateCrontab() }], - activationCommands: [`crontab ${path}`], + activationCommands: [`selftune schedule --apply-cron-artifact ${path}`], }; } @@ -295,6 +324,10 @@ export function buildInstallPlan( }; } + if (format !== "systemd") { + throw new Error(`Unknown format "${format}". Valid formats: ${VALID_FORMATS.join(", ")}`); + } + const systemdDir = join(homeDir, ".config", "systemd", "user"); const definitions = SCHEDULE_ENTRIES.map(buildSystemdDefinition); return { @@ -307,7 +340,9 @@ export function buildInstallPlan( ]), activationCommands: [ "systemctl --user daemon-reload", - ...definitions.map((definition) => `systemctl --user enable --now ${definition.baseName}.timer`), + ...definitions.map( + (definition) => `systemctl --user enable --now ${definition.baseName}.timer`, + ), ], }; } @@ -317,13 +352,44 @@ function runShellCommand(command: string): number { return result.status ?? 1; } -export function installSchedule(options: { - format?: string; - dryRun?: boolean; - homeDir?: string; - platform?: NodeJS.Platform; - runCommand?: (command: string) => number; -} = {}): ScheduleInstallResult { +function readCurrentCrontab(): string { + const result = spawnSync("crontab", ["-l"], { encoding: "utf8" }); + + if (result.status === 0) { + return result.stdout; + } + + const stderr = (result.stderr ?? "").trim(); + if (stderr.includes("no crontab for")) { + return ""; + } + + throw new Error(stderr || `crontab -l failed with exit code ${result.status ?? 1}`); +} + +export function applyCronArtifact(artifactPath: string): void { + const artifactContent = readFileSync(artifactPath, "utf-8"); + const mergedPath = artifactPath.replace(/\.crontab$/, ".merged.crontab"); + const mergedContent = mergeManagedCrontab(readCurrentCrontab(), artifactContent); + + mkdirSync(dirname(mergedPath), { recursive: true }); + writeFileSync(mergedPath, mergedContent, "utf-8"); + + const result = spawnSync("crontab", [mergedPath], { stdio: "inherit" }); + if ((result.status ?? 1) !== 0) { + throw new Error(`Failed to install merged crontab from ${mergedPath}`); + } +} + +export function installSchedule( + options: { + format?: string; + dryRun?: boolean; + homeDir?: string; + platform?: NodeJS.Platform; + runCommand?: (command: string) => number; + } = {}, +): ScheduleInstallResult { const formatResult = selectInstallFormat(options.format, options.platform); if (!formatResult.ok) { throw new Error(formatResult.error); @@ -340,8 +406,17 @@ export function installSchedule(options: { let activated = false; if (!dryRun) { - const runCommand = options.runCommand ?? runShellCommand; - activated = plan.activationCommands.every((command) => runCommand(command) === 0); + if (formatResult.format === "cron") { + const cronArtifact = plan.artifacts[0]; + if (!cronArtifact) { + throw new Error("Cron install plan is missing the selftune crontab artifact."); + } + applyCronArtifact(cronArtifact.path); + activated = true; + } else { + const runCommand = options.runCommand ?? runShellCommand; + activated = plan.activationCommands.every((command) => runCommand(command) === 0); + } } return { @@ -400,12 +475,25 @@ export function cliMain(): void { format: { type: "string", short: "f" }, install: { type: "boolean", default: false }, "dry-run": { type: "boolean", default: false }, + "apply-cron-artifact": { type: "string" }, help: { type: "boolean", default: false }, }, strict: false, allowPositionals: true, }); + if (values["apply-cron-artifact"]) { + try { + applyCronArtifact(values["apply-cron-artifact"]); + return; + } catch (err) { + console.error( + `Failed to apply selftune cron artifact: ${err instanceof Error ? err.message : String(err)}`, + ); + process.exit(1); + } + } + if (values.help) { console.log(`selftune schedule — Generate scheduling examples for automation @@ -429,24 +517,35 @@ For OpenClaw-specific scheduling, see: selftune cron`); } if (values.install) { - const result = installSchedule({ - format: values.format, - dryRun: values["dry-run"] ?? false, - }); - console.log( - JSON.stringify( - { - format: result.format, - installed: !result.dryRun, - activated: result.activated, - files: result.artifacts.map((artifact) => artifact.path), - activationCommands: result.activationCommands, - }, - null, - 2, - ), - ); - return; + try { + const result = installSchedule({ + format: values.format, + dryRun: values["dry-run"] ?? false, + }); + if (!result.dryRun && !result.activated) { + console.error("Failed to activate installed schedule artifacts."); + process.exit(1); + } + console.log( + JSON.stringify( + { + format: result.format, + installed: !result.dryRun, + activated: result.activated, + files: result.artifacts.map((artifact) => artifact.path), + activationCommands: result.activationCommands, + }, + null, + 2, + ), + ); + return; + } catch (err) { + console.error( + `Failed to install schedule artifacts: ${err instanceof Error ? err.message : String(err)}`, + ); + process.exit(1); + } } const result = formatOutput(values.format); diff --git a/docs/integration-guide.md b/docs/integration-guide.md index 3ecf595..bf916b3 100644 --- a/docs/integration-guide.md +++ b/docs/integration-guide.md @@ -391,7 +391,7 @@ selftune is designed to run unattended on any machine. The core automation loop is centered on one command: ```text -orchestrate +selftune orchestrate ``` `selftune orchestrate` runs source-truth sync first, selects candidate skills, diff --git a/skill/Workflows/Schedule.md b/skill/Workflows/Schedule.md index 17c6201..309a877 100644 --- a/skill/Workflows/Schedule.md +++ b/skill/Workflows/Schedule.md @@ -18,8 +18,8 @@ For OpenClaw-specific scheduling, see `Workflows/Cron.md`. The core selftune automation loop is one command: -``` -orchestrate +```bash +selftune orchestrate ``` `selftune orchestrate` runs source-truth sync first, selects candidate skills, diff --git a/tests/dashboard/dashboard-server.test.ts b/tests/dashboard/dashboard-server.test.ts index ebc1fda..33e433f 100644 --- a/tests/dashboard/dashboard-server.test.ts +++ b/tests/dashboard/dashboard-server.test.ts @@ -378,6 +378,40 @@ describe("SPA shell loading", () => { server.stop(); } }); + + it("returns 503 when a configured spaDir is missing index.html", async () => { + const brokenSpaDir = mkdtempSync(join(tmpdir(), "selftune-dashboard-broken-")); + mkdirSync(join(brokenSpaDir, "assets"), { recursive: true }); + + const server = await startDashboardServer({ + port: 0, + host: "127.0.0.1", + spaDir: brokenSpaDir, + openBrowser: false, + overviewLoader: () => overviewFixture, + skillReportLoader: () => skillReportFixture, + statusLoader: () => ({ + skills: [], + unmatchedQueries: 0, + pendingProposals: 0, + lastSession: null, + system: { + healthy: true, + pass: 0, + fail: 0, + warn: 0, + }, + }), + }); + + try { + const res = await fetch(`http://127.0.0.1:${server.port}/`); + expect(res.status).toBe(503); + } finally { + server.stop(); + rmSync(brokenSpaDir, { recursive: true, force: true }); + } + }); }); describe("report loading", () => { diff --git a/tests/schedule/schedule.test.ts b/tests/schedule/schedule.test.ts index 65b7589..d5c5e96 100644 --- a/tests/schedule/schedule.test.ts +++ b/tests/schedule/schedule.test.ts @@ -1,14 +1,17 @@ import { describe, expect, test } from "bun:test"; import { + applyCronArtifact, buildInstallPlan, formatOutput, generateCrontab, generateLaunchd, generateSystemd, installSchedule, + mergeManagedCrontab, SCHEDULE_ENTRIES, selectInstallFormat, + wrapManagedCrontabBlock, } from "../../cli/selftune/schedule.js"; // --------------------------------------------------------------------------- @@ -152,12 +155,23 @@ describe("generateSystemd", () => { }); describe("install helpers", () => { + test("selectInstallFormat rejects unknown format", () => { + expect(selectInstallFormat("docker")).toEqual({ + ok: false, + error: 'Unknown format "docker". Valid formats: cron, launchd, systemd', + }); + }); + test("selectInstallFormat defaults by platform", () => { expect(selectInstallFormat(undefined, "darwin")).toEqual({ ok: true, format: "launchd" }); expect(selectInstallFormat(undefined, "linux")).toEqual({ ok: true, format: "systemd" }); expect(selectInstallFormat(undefined, "win32")).toEqual({ ok: true, format: "cron" }); }); + test("buildInstallPlan rejects unknown format at runtime", () => { + expect(() => buildInstallPlan("docker" as never, "/tmp/test-home")).toThrow(/Unknown format/); + }); + test("buildInstallPlan returns launchd artifacts and activation commands", () => { const plan = buildInstallPlan("launchd", "/tmp/test-home"); expect(plan.artifacts.some((artifact) => artifact.path.includes("LaunchAgents"))).toBe(true); @@ -181,7 +195,35 @@ describe("install helpers", () => { expect(result.dryRun).toBe(true); expect(result.activated).toBe(false); expect(commandsRun).toBe(0); - expect(result.artifacts[0]?.path).toContain(".selftune/schedule/selftune.crontab"); + expect(result.artifacts[0]?.path).toMatch( + /[\\/]\.selftune[\\/]schedule[\\/]selftune\.crontab$/, + ); + }); + + test("installSchedule throws for unknown format", () => { + expect(() => installSchedule({ format: "docker" })).toThrow(/Unknown format/); + }); + + test("mergeManagedCrontab preserves unrelated jobs and replaces the selftune block", () => { + const existing = [ + "MAILTO=user@example.com", + "0 1 * * * backup-job", + wrapManagedCrontabBlock("old-selftune-job"), + "15 3 * * * analytics-job", + ].join("\n"); + + const merged = mergeManagedCrontab(existing, "0 */6 * * * selftune orchestrate --max-skills 3"); + + expect(merged).toContain("MAILTO=user@example.com"); + expect(merged).toContain("0 1 * * * backup-job"); + expect(merged).toContain("15 3 * * * analytics-job"); + expect(merged).toContain("# BEGIN SELFTUNE"); + expect(merged).toContain("0 */6 * * * selftune orchestrate --max-skills 3"); + expect(merged).not.toContain("old-selftune-job"); + }); + + test("applyCronArtifact throws when the artifact is missing", () => { + expect(() => applyCronArtifact("/tmp/does-not-exist/selftune.crontab")).toThrow(); }); }); From 8c5db2e39683d2776c52a0d1a944ede6df1b4848 Mon Sep 17 00:00:00 2001 From: WellDunDun <45949032+WellDunDun@users.noreply.github.com> Date: Sat, 14 Mar 2026 19:46:20 +0300 Subject: [PATCH 14/14] Clarify sync force usage in README --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 4b5d022..86b84c4 100644 --- a/README.md +++ b/README.md @@ -41,11 +41,13 @@ Quick proof path: ```bash npx selftune@latest doctor -npx selftune@latest sync --force +npx selftune@latest sync npx selftune@latest status npx selftune@latest dashboard ``` +Use `--force` only when you explicitly need to rebuild local state from scratch. + Autonomy quick start: ```bash