Self-improving skills for AI agents.
Your agent skills learn how you work. Detect what's broken. Improve low-risk skill behavior automatically.
Install · Use Cases · How It Works · Commands · Platforms · Docs
Your skills do not understand how you talk. You say "make me a slide deck" and nothing happens: no error, no signal, no clue why the right skill never fired. selftune reads the transcripts and telemetry your agent already saves, learns how you actually speak, and improves skill descriptions to match. It validates changes before deployment, watches for regressions after, and rolls back when needed.
Built for Claude Code. Also works with Codex, OpenCode, and OpenClaw. Zero runtime dependencies.
npx skills add selftune-dev/selftuneThen tell your agent: "initialize selftune"
Two minutes. No API keys. No external services. No configuration ceremony. Uses your existing agent subscription.
Quick proof path:
npx selftune@latest doctor
npx selftune@latest sync
npx selftune@latest status
npx selftune@latest dashboardUse --force only when you explicitly need to rebuild local state from scratch.
Autonomy quick start:
npx selftune@latest init --enable-autonomy
npx selftune@latest orchestrate --dry-run
npx selftune@latest schedule --install --dry-runCLI only (no installed skill):
npx selftune@latest doctorselftune learned that real users say "slides", "deck", "presentation for Monday" — none of which matched the original skill description. It rewrote the description to match how people actually talk. Validated against the eval set. Deployed with a backup. Done.
I write and use my own skills — You built skills for your workflow but your descriptions don't match how you actually talk. selftune learns your language from real sessions and evolves descriptions to match — no more manual tuning. selftune status · selftune evolve · selftune baseline
I publish skills others install — Your skill works for you, but every user talks differently. selftune ships skills that get better for every user automatically — adapting descriptions to how each person actually works. selftune status · selftune evals · selftune badge
I manage an agent setup with many skills — You have 15+ skills installed.
Some work. Some chain together. Some conflict. selftune shows which
combinations repeat, which ones help, and where the friction is.
selftune dashboard · selftune composability · selftune workflows
A continuous feedback loop that makes your skills learn and adapt from real work.
Observe — selftune reads the transcripts and telemetry your agents already save. On Claude Code, hooks can add low-latency hints, but transcripts and logs are the source of truth. Use selftune sync to ingest current activity and selftune replay to backfill older Claude Code sessions.
Detect — selftune finds the gap between how you talk and how your skills are described. It spots missed triggers, underperforming descriptions, noisy environments, and regressions in real usage.
Evolve — For low-risk changes, selftune can autonomously rewrite skill descriptions to match how you actually work. Every proposal is validated before deploy. Full skill-body or routing changes stay available for higher-touch workflows.
Watch — After deploying changes, selftune monitors trigger quality and post-deploy evidence. If something regresses, it can roll back automatically. The goal is autonomous improvement with safeguards, not blind self-editing.
- Source-truth sync —
selftune syncnow leads the product loop, using transcripts/logs as truth and hooks as hints - SQLite-backed local app —
selftune dashboardnow serves the React SPA by default with faster overview/report routes on top of materialized local data - Autonomous low-risk evolution — description evolution is autonomous by default, with explicit review-required mode for stricter policies
- Autonomous scheduling —
selftune init --enable-autonomyandselftune schedule --installmake the orchestrated loop the default recurring runtime - Full skill body evolution — evolve routing tables and entire skill bodies using teacher-student model with structural, trigger, and quality gates
- Synthetic eval generation —
selftune evals --syntheticgenerates eval sets fromSKILL.mdfor cold-start skills - Cheap-loop evolution —
selftune evolve --cheap-loopuses haiku for proposal generation and validation, sonnet only for the final deployment gate - Per-stage model control —
--validation-model,--proposal-model, and--gate-modelgive fine-grained control over each evolution stage - Sandbox test harness — automated coverage, including devcontainer-based LLM testing
- Workflow discovery + codification —
selftune workflowsfinds repeated multi-skill sequences from telemetry and can append them to## WorkflowsinSKILL.md
| Command | What it does |
|---|---|
selftune doctor |
Health check: logs, config, permissions, dashboard build/runtime expectations |
selftune sync |
Ingest source-truth activity from supported agents and rebuild local state |
selftune status |
See which skills are undertriggering and why |
selftune dashboard |
Open the React SPA dashboard (SQLite-backed) |
selftune orchestrate |
Run the core loop: sync, inspect candidates, evolve, and watch |
selftune schedule --install |
Install platform-native scheduling for the autonomous loop |
selftune evals --skill <name> |
Generate eval sets from real session data (--synthetic for cold-start) |
selftune evolve --skill <name> |
Propose, validate, and deploy improved descriptions (--cheap-loop, --with-baseline) |
selftune evolve-body --skill <name> |
Evolve full skill body or routing table (teacher-student, 3-gate validation) |
selftune watch --skill <name> |
Monitor after deploy. Auto-rollback on regression. |
selftune replay |
Backfill data from existing Claude Code transcripts |
selftune baseline --skill <name> |
Measure skill value vs no-skill baseline |
selftune unit-test --skill <name> |
Run or generate skill-level unit tests |
selftune composability --skill <name> |
Measure synergy and conflicts between co-occurring skills, with workflow-candidate hints |
selftune workflows |
Discover repeated multi-skill workflows and save a discovered workflow into SKILL.md |
selftune import-skillsbench |
Import external eval corpus from SkillsBench |
selftune badge --skill <name> |
Generate skill health badge SVG |
selftune cron setup |
Optional scheduler helper for OpenClaw-oriented automation |
Full command reference: selftune --help
| Approach | Problem |
|---|---|
| Rewrite the description yourself | No data on how users actually talk. No validation. No regression detection. |
| Add "ALWAYS invoke when..." directives | Brittle. One agent rewrite away from breaking. |
| Force-load skills on every prompt | Doesn't fix the description. Expensive band-aid. |
| selftune | Learns from real usage, rewrites descriptions to match how you work, validates against eval sets, auto-rollbacks on regressions. |
Observability tools trace LLM calls. Skill authoring tools help you write skills. Neither knows whether the right skill fired for the right person. selftune does — and fixes it automatically.
| Dimension | selftune | Braintrust / Langfuse | skill-creator / SkillForge |
|---|---|---|---|
| Layer | Skill-specific | LLM call / agent trace | Skill authoring |
| When | Runtime (real sessions) | Runtime (traces) | Authoring time (manual) |
| Detects | Missed triggers, false negatives, conflicts | Token usage, latency, chain failures | — |
| Improves | Descriptions, body, routing — automatically | — | Helps you write better manually |
| Closed loop | Yes — observe → evolve → watch → repeat | No | No |
| Setup | Zero deps, zero API keys | Self-host or cloud | Included with agent |
| Price | Free (MIT) | Freemium / Paid | Free |
Claude Code (primary) — Reads saved transcripts and telemetry directly. Hooks install automatically and add low-latency hints. selftune replay backfills older Claude Code sessions. Full feature support.
Codex — selftune wrap-codex -- <args> or selftune ingest-codex
OpenCode — selftune ingest-opencode
OpenClaw — selftune ingest-openclaw. selftune cron setup remains available as an optional OpenClaw-oriented scheduler helper, but the main product loop is still selftune orchestrate plus generic scheduling.
Requires Bun or Node.js 18+. No extra API keys.
Architecture · Contributing · Security · Integration Guide · Sponsor
MIT licensed. Free forever. Built for Claude Code.

