-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
enhancementNew feature or requestNew feature or requesthorizonProject Horizon initiativeProject Horizon initiative
Description
Phase C: Token Analytics & Intelligence
Parent: #17 (Project Horizon)
Priority: Medium — proves ROI, drives retention
Depends on: Phase A (#18) for hook infrastructure, Phase B (#19) for command data
Context
Users want to see proof that their token optimization tool is working. A persistent tracking system with a dashboard transforms skim from a "trust me, it's saving tokens" tool into a "here's exactly how much you saved this week" tool. This drives retention and creates shareable proof for social/marketing.
The key differentiator: skim uses tiktoken for accurate token counting, not the industry-common chars/4 heuristic (which is ~25% off).
Deliverables
1. SQLite Token Tracking
- Track every skim invocation: command, input tokens, output tokens, savings %, exec time, project path
- Store at
~/.local/share/skim/tracking.db(XDG-compliant) - Use tiktoken (cl100k_base) for accurate counts — this is our accuracy advantage
- Automatic 90-day retention with cleanup
- Per-project scoping via canonical project path
- Opt-out via
--no-trackflag or environment variable
2. skim stats — Savings Dashboard
- Summary view: total commands, tokens saved, average savings %, total time
- Per-command breakdown: which commands save the most tokens
- Time series: daily/weekly/monthly savings trends
- Top files by token savings (for code reading)
- Export formats: text (default), JSON, CSV
- Project scoping:
skim stats --project /path/to/repo
3. skim discover — Missed Savings Finder
- Scan Claude Code session history (
~/.claude/projects/**/*.jsonl) - Identify
cat,head,tail,Readtool calls that could have beenskim - Estimate potential token savings based on file sizes and typical reduction rates
- Report: "In your last 10 sessions, you could have saved ~X tokens by routing file reads through skim"
- Suggest:
skim initif hooks aren't installed
4. Accuracy Marketing Data
- Built-in benchmark mode:
skim bench file.rs— show exact token counts for each mode - Compare tiktoken vs chars/4 estimate to demonstrate accuracy difference
- Generate shareable savings report:
skim stats --export-report
Schema Design
CREATE TABLE invocations (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TEXT NOT NULL, -- RFC 3339
command_type TEXT NOT NULL, -- "read", "test", "git", "build"
original_cmd TEXT NOT NULL, -- "cat file.rs"
skim_cmd TEXT NOT NULL, -- "skim file.rs --mode=structure"
input_tokens INTEGER NOT NULL, -- tiktoken count of raw output
output_tokens INTEGER NOT NULL, -- tiktoken count of skim output
savings_pct REAL NOT NULL, -- (saved/input) * 100
exec_time_ms INTEGER NOT NULL, -- wall clock time
project_path TEXT DEFAULT '', -- canonical cwd
file_path TEXT DEFAULT '', -- for read commands
language TEXT DEFAULT '', -- detected language
mode TEXT DEFAULT '' -- structure/signatures/types/minimal/full
);
CREATE INDEX idx_timestamp ON invocations(timestamp);
CREATE INDEX idx_project ON invocations(project_path, timestamp);
CREATE INDEX idx_command_type ON invocations(command_type);Acceptance Criteria
- Token tracking adds <2ms overhead per invocation
-
skim statsrenders a useful dashboard in terminal -
skim discovercorrectly identifies missed savings from session history - All token counts use tiktoken (cl100k_base), not heuristics
- Tracking is opt-out, not opt-in (default on,
--no-trackto disable) - Database file size stays under 10MB for 90 days of heavy usage
Open Questions
- Should
skim statssupport a TUI mode (interactive) or just static output? - Should we track per-file stats to show "your most-read files" insights?
- Should the discover command support other agents besides Claude Code?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requesthorizonProject Horizon initiativeProject Horizon initiative