Skip to content

Phase C: Token Analytics & Intelligence — Tracking, Stats & Discovery #20

@dean0x

Description

@dean0x

Phase C: Token Analytics & Intelligence

Parent: #17 (Project Horizon)
Priority: Medium — proves ROI, drives retention
Depends on: Phase A (#18) for hook infrastructure, Phase B (#19) for command data

Context

Users want to see proof that their token optimization tool is working. A persistent tracking system with a dashboard transforms skim from a "trust me, it's saving tokens" tool into a "here's exactly how much you saved this week" tool. This drives retention and creates shareable proof for social/marketing.

The key differentiator: skim uses tiktoken for accurate token counting, not the industry-common chars/4 heuristic (which is ~25% off).

Deliverables

1. SQLite Token Tracking

  • Track every skim invocation: command, input tokens, output tokens, savings %, exec time, project path
  • Store at ~/.local/share/skim/tracking.db (XDG-compliant)
  • Use tiktoken (cl100k_base) for accurate counts — this is our accuracy advantage
  • Automatic 90-day retention with cleanup
  • Per-project scoping via canonical project path
  • Opt-out via --no-track flag or environment variable

2. skim stats — Savings Dashboard

  • Summary view: total commands, tokens saved, average savings %, total time
  • Per-command breakdown: which commands save the most tokens
  • Time series: daily/weekly/monthly savings trends
  • Top files by token savings (for code reading)
  • Export formats: text (default), JSON, CSV
  • Project scoping: skim stats --project /path/to/repo

3. skim discover — Missed Savings Finder

  • Scan Claude Code session history (~/.claude/projects/**/*.jsonl)
  • Identify cat, head, tail, Read tool calls that could have been skim
  • Estimate potential token savings based on file sizes and typical reduction rates
  • Report: "In your last 10 sessions, you could have saved ~X tokens by routing file reads through skim"
  • Suggest: skim init if hooks aren't installed

4. Accuracy Marketing Data

  • Built-in benchmark mode: skim bench file.rs — show exact token counts for each mode
  • Compare tiktoken vs chars/4 estimate to demonstrate accuracy difference
  • Generate shareable savings report: skim stats --export-report

Schema Design

CREATE TABLE invocations (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    timestamp TEXT NOT NULL,          -- RFC 3339
    command_type TEXT NOT NULL,       -- "read", "test", "git", "build"
    original_cmd TEXT NOT NULL,       -- "cat file.rs"
    skim_cmd TEXT NOT NULL,           -- "skim file.rs --mode=structure"
    input_tokens INTEGER NOT NULL,    -- tiktoken count of raw output
    output_tokens INTEGER NOT NULL,   -- tiktoken count of skim output
    savings_pct REAL NOT NULL,        -- (saved/input) * 100
    exec_time_ms INTEGER NOT NULL,    -- wall clock time
    project_path TEXT DEFAULT '',     -- canonical cwd
    file_path TEXT DEFAULT '',        -- for read commands
    language TEXT DEFAULT '',         -- detected language
    mode TEXT DEFAULT ''              -- structure/signatures/types/minimal/full
);

CREATE INDEX idx_timestamp ON invocations(timestamp);
CREATE INDEX idx_project ON invocations(project_path, timestamp);
CREATE INDEX idx_command_type ON invocations(command_type);

Acceptance Criteria

  • Token tracking adds <2ms overhead per invocation
  • skim stats renders a useful dashboard in terminal
  • skim discover correctly identifies missed savings from session history
  • All token counts use tiktoken (cl100k_base), not heuristics
  • Tracking is opt-out, not opt-in (default on, --no-track to disable)
  • Database file size stays under 10MB for 90 days of heavy usage

Open Questions

  • Should skim stats support a TUI mode (interactive) or just static output?
  • Should we track per-file stats to show "your most-read files" insights?
  • Should the discover command support other agents besides Claude Code?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthorizonProject Horizon initiative

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions