Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/).
- **SQLite v2 API endpoints** — `GET /api/v2/overview` and `GET /api/v2/skills/:name` backed by materialized SQLite queries (`getOverviewPayload()`, `getSkillReportPayload()`, `getSkillsList()`)
- **SQL query optimizations** — Replaced `NOT IN` subqueries with `LEFT JOIN + IS NULL`, moved JS-side dedup to SQL `GROUP BY`, added `LIMIT 200` to unbounded evidence queries
- **SPA serving from dashboard server** — Built SPA served at `/` as the supported local dashboard experience
- **Orchestrate decision report** — `selftune orchestrate` now prints a 5-phase human-readable decision report (sync, status, decisions, evolution results, watch) to stderr, and enriched JSON with a per-skill `decisions` array to stdout
- **Source-truth-driven pipeline** — Transcripts and rollouts are now the authoritative source; `sync` rebuilds repaired overlays from source data rather than relying solely on hook-time capture
- **Telemetry contract package** — `@selftune/telemetry-contract` workspace package with canonical schema types, validators, versioning, metadata, and golden fixture tests
- **Test split** — `make test-fast` / `make test-slow` and `bun run test:fast` / `bun run test:slow` for faster development feedback loop
Expand Down
221 changes: 201 additions & 20 deletions cli/selftune/orchestrate.ts
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,174 @@ export interface OrchestrateResult {
};
}

// ---------------------------------------------------------------------------
// Human-readable decision report
// ---------------------------------------------------------------------------

function formatSyncPhase(syncResult: SyncResult): string[] {
const lines: string[] = ["Phase 1: Sync"];
const sources: [string, keyof SyncResult["sources"]][] = [
["Claude", "claude"],
["Codex", "codex"],
["OpenCode", "opencode"],
["OpenClaw", "openclaw"],
];

for (const [label, key] of sources) {
const s = syncResult.sources[key];
if (!s.available) {
lines.push(` ${label.padEnd(12)}not available`);
} else if (s.synced > 0) {
lines.push(` ${label.padEnd(12)}scanned ${s.scanned}, synced ${s.synced}`);
} else {
lines.push(` ${label.padEnd(12)}scanned ${s.scanned}, up to date`);
}
}

if (syncResult.repair.ran && syncResult.repair.repaired_records > 0) {
lines.push(
` Repair ${syncResult.repair.repaired_records} records across ${syncResult.repair.repaired_sessions} sessions`,
);
}

return lines;
}

function formatStatusPhase(statusResult: StatusResult): string[] {
const lines: string[] = ["Phase 2: Status"];
const byStatus: Record<string, number> = {};
for (const skill of statusResult.skills) {
byStatus[skill.status] = (byStatus[skill.status] ?? 0) + 1;
}
const healthLabel = statusResult.system.healthy ? "healthy" : "UNHEALTHY";
lines.push(` ${statusResult.skills.length} skills found, system ${healthLabel}`);

const parts: string[] = [];
for (const s of ["CRITICAL", "WARNING", "HEALTHY", "UNGRADED", "UNKNOWN"]) {
if (byStatus[s]) parts.push(`${byStatus[s]} ${s}`);
}
if (parts.length > 0) lines.push(` ${parts.join(", ")}`);

return lines;
}

function formatDecisionPhase(candidates: SkillAction[]): string[] {
const lines: string[] = ["Phase 3: Skill Decisions"];
if (candidates.length === 0) {
lines.push(" (no skills to evaluate)");
return lines;
}

for (const c of candidates) {
const icon = c.action === "skip" ? "⊘" : c.action === "watch" ? "○" : "→";
const actionLabel = c.action.toUpperCase().padEnd(7);
lines.push(` ${icon} ${c.skill.padEnd(20)} ${actionLabel} ${c.reason}`);
}

return lines;
}

function formatEvolutionPhase(candidates: SkillAction[]): string[] {
const evolved = candidates.filter((c) => c.action === "evolve" && c.evolveResult !== undefined);
if (evolved.length === 0) return [];

const lines: string[] = ["Phase 4: Evolution Results"];
for (const c of evolved) {
const r = c.evolveResult as NonNullable<typeof c.evolveResult>;
const status = r.deployed ? "deployed" : "not deployed";
const detail = r.reason;
const validation = r.validation
? ` (${(r.validation.before_pass_rate * 100).toFixed(0)}% → ${(r.validation.after_pass_rate * 100).toFixed(0)}%)`
: "";
lines.push(` ${c.skill.padEnd(20)} ${status}${validation}`);
lines.push(` ${"".padEnd(20)} ${detail}`);
}

return lines;
}

function formatWatchPhase(candidates: SkillAction[]): string[] {
const watched = candidates.filter((c) => c.action === "watch");
if (watched.length === 0) return [];

const lines: string[] = ["Phase 5: Watch"];
for (const c of watched) {
const snap = c.watchResult?.snapshot;
const metrics = snap
? ` (pass_rate=${snap.pass_rate.toFixed(2)}, baseline=${snap.baseline_pass_rate.toFixed(2)})`
: "";
const alertTag = c.watchResult?.alert ? " [ALERT]" : "";
const rollbackTag = c.watchResult?.rolledBack ? " [ROLLED BACK]" : "";
lines.push(` ${c.skill.padEnd(20)} ${c.reason}${alertTag}${rollbackTag}${metrics}`);
}

return lines;
}

export function formatOrchestrateReport(result: OrchestrateResult): string {
const sep = "═".repeat(48);
const lines: string[] = [];

lines.push(sep);
lines.push("selftune orchestrate — decision report");
lines.push(sep);
lines.push("");

// Mode banner
if (result.summary.dryRun) {
lines.push("Mode: DRY RUN (no mutations applied)");
} else if (result.summary.approvalMode === "review") {
lines.push("Mode: REVIEW (proposals validated but not deployed)");
} else {
lines.push("Mode: AUTONOMOUS (validated changes deployed automatically)");
}
lines.push("");

// Phase 1: Sync
lines.push(...formatSyncPhase(result.syncResult));
lines.push("");

// Phase 2: Status
lines.push(...formatStatusPhase(result.statusResult));
lines.push("");

// Phase 3: Skill decisions
lines.push(...formatDecisionPhase(result.candidates));
lines.push("");

// Phase 4: Evolution results (only if any evolve ran)
const evoLines = formatEvolutionPhase(result.candidates);
if (evoLines.length > 0) {
lines.push(...evoLines);
lines.push("");
}

// Phase 5: Watch (only if any watched)
const watchLines = formatWatchPhase(result.candidates);
if (watchLines.length > 0) {
lines.push(...watchLines);
lines.push("");
}

// Final summary
lines.push("Summary");
lines.push(` Evaluated: ${result.summary.evaluated} skills`);
lines.push(` Deployed: ${result.summary.deployed}`);
lines.push(` Watched: ${result.summary.watched}`);
lines.push(` Skipped: ${result.summary.skipped}`);
lines.push(` Elapsed: ${(result.summary.elapsedMs / 1000).toFixed(1)}s`);

if (result.summary.dryRun && result.summary.evaluated > 0) {
lines.push("");
lines.push(" Rerun without --dry-run to allow validated deployments.");
} else if (result.summary.approvalMode === "review" && result.summary.evaluated > 0) {
lines.push("");
lines.push(" Rerun without --review-required to allow validated deployments.");
}

return lines.join("\n");
}

/** Candidate selection criteria. */
const CANDIDATE_STATUSES = new Set(["CRITICAL", "WARNING", "UNGRADED"]);

Expand Down Expand Up @@ -506,27 +674,40 @@ Examples:
syncForce: values["sync-force"] ?? false,
});

// Print JSON summary to stdout
console.log(JSON.stringify(result.summary, null, 2));

// Print human-readable recap to stderr
console.error(`\n${"═".repeat(40)}`);
console.error("selftune orchestrate — summary");
console.error("═".repeat(40));
console.error(` Total skills: ${result.summary.totalSkills}`);
console.error(` Evaluated: ${result.summary.evaluated}`);
console.error(` Deployed: ${result.summary.deployed}`);
console.error(` Watched: ${result.summary.watched}`);
console.error(` Skipped: ${result.summary.skipped}`);
console.error(` Dry run: ${result.summary.dryRun}`);
console.error(` Approval mode: ${result.summary.approvalMode}`);
console.error(` Elapsed: ${(result.summary.elapsedMs / 1000).toFixed(1)}s`);
// JSON output: include per-skill decisions for machine consumption
const jsonOutput = {
...result.summary,
decisions: result.candidates.map((c) => ({
skill: c.skill,
action: c.action,
reason: c.reason,
...(c.evolveResult
? {
deployed: c.evolveResult.deployed,
evolveReason: c.evolveResult.reason,
validation: c.evolveResult.validation
? {
before: c.evolveResult.validation.before_pass_rate,
after: c.evolveResult.validation.after_pass_rate,
improved: c.evolveResult.validation.improved,
}
: null,
}
: {}),
...(c.watchResult
? {
alert: c.watchResult.alert,
rolledBack: c.watchResult.rolledBack,
passRate: c.watchResult.snapshot?.pass_rate ?? null,
recommendation: c.watchResult.recommendation,
}
: {}),
})),
};
console.log(JSON.stringify(jsonOutput, null, 2));

if (result.summary.dryRun && result.summary.evaluated > 0) {
console.error("\n Rerun without --dry-run to allow validated deployments.");
} else if (result.summary.approvalMode === "review" && result.summary.evaluated > 0) {
console.error("\n Rerun without --review-required to allow validated deployments.");
}
// Print human-readable decision report to stderr
console.error(`\n${formatOrchestrateReport(result)}`);

process.exit(0);
}
Expand Down
25 changes: 19 additions & 6 deletions skill/Workflows/Orchestrate.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,12 +59,25 @@ Use `--review-required` only when you want a stricter policy for a specific run.

## Output

The command prints:
### Human-readable report (stderr)

- sync results
- candidate-selection reasoning
- evolve/watch actions taken
- skipped skills and why
- a final summary with counts and elapsed time
A phased decision report printed to stderr so you can see exactly what happened and why:

1. **Phase 1: Sync** — which sources were scanned, how many records synced, repair counts
2. **Phase 2: Status** — skill count, system health, breakdown by status category
3. **Phase 3: Skill Decisions** — each skill with its action (EVOLVE / WATCH / SKIP) and reason
4. **Phase 4: Evolution Results** — validation pass-rate changes (before → after), deployment status
5. **Phase 5: Watch** — post-deploy monitoring with alert and rollback indicators
6. **Summary** — evaluated/deployed/watched/skipped counts and elapsed time

A mode banner at the top shows DRY RUN, REVIEW, or AUTONOMOUS with rerun hints when applicable.

### JSON output (stdout)

Machine-readable JSON with the summary fields plus a `decisions` array containing per-skill:

- `skill`, `action`, `reason`
- `deployed`, `evolveReason`, `validation` (before/after pass rates, improved flag) — when evolved
- `alert`, `rolledBack`, `passRate`, `recommendation` — when watched

This is the recommended runtime for recurring autonomous scheduling.
Loading
Loading