Skip to content

Latest commit

 

History

History
231 lines (208 loc) · 15.2 KB

File metadata and controls

231 lines (208 loc) · 15.2 KB

Task Plan: WhisperWoof — Voice-First Personal Automation

Goal

Fork OpenWhispr and build WhisperWoof: a voice-first personal automation tool that transcribes, polishes (local LLM), routes (hotkey-driven), and stores (unified capture layer) voice and clipboard input.

Current Phase

v1.9.0 shipped + unreleased eng-review cleanup (758 tests, 13 of 35 test-truthfulness files refactored [Bucket B complete], 5 surgical upstream cherry-picks, STT config error fixed, API key export leak fixed). Next: Bucket C test refactor OR switch to Phase 11 (code signing + notarization) once Apple Developer cert arrives.

Phases

Phase 0: Fork + Audit + Harden

  • Fork OpenWhispr, build locally, verify app boots
  • Security audit: 241 IPC methods catalogued, CSP added, webSecurity re-enabled, URL/path validation
  • Proxy cloud API calls — CSP connect-src allowlist configured
  • Set up Vitest + write tests (70 tests passing across 4 files)
  • Rebrand: package.json, electron-builder.json, main.js, windowConfig.js
  • WhisperWoof core modules built: StorageProvider interface, OllamaService, HotkeyRouter, ClipboardMonitor, Pipeline (runtime SQL lives in bridge/app-init.js; SqliteProvider class was deleted 2026-04-11 as dead code)
  • Wire WhisperWoof init into main.js (startApp + will-quit)
  • Validate Fn key — works, timing improved (75ms hold, 100ms cooldown, crash recovery)
  • Status: complete
  • Depends on: Nothing — this is the starting point
  • Done: OpenWhispr merged, rebranded, security hardened, 70 tests pass, app boots

Phase 1a: Core Pipeline (sequential — each depends on previous)

  • StorageProvider interface + wrap existing Kysely/database.js
  • Add entries table, projects table, FTS5 index, audit_log table
  • Rewrite ReasoningService → OllamaService (Ollama HTTP API at localhost:11434)
  • Save voice transcriptions to bf_entries (dual-write with OpenWhispr)
  • History query/search/delete API via IPC
  • Learning mode toast (before/after polish, first 20 captures)
  • Hotkey routing — via Command Bar (Cmd+K → /todo, /note, /project)
  • Fn+letter combo routing — globe-listener detects keyDown while Fn held, routes Fn+T→clipboard, Fn+N→markdown, Fn+P→project
  • GATE: Use WhisperWoof daily for 3 days. Fix issues before proceeding. (Daily-driven through v1.9.0)
  • Status: complete
  • Depends on: Phase 0 complete ✓
  • Gate criteria: Daily-usable. If Ollama latency bad → fix or cut. If Fn key broken → switch default.

Phase 1b: New Features (parallel — start only after 1a gate passes)

  • ClipboardMonitor (polling every 500ms, dedup, saves to bf_entries)
  • Floating indicator reskin (dog ear SVG, amber brand, centered, 48px)
  • Voice-to-Markdown route (Fn+N → .md file to ~/Documents/WhisperWoof Notes/)
  • History UI (search + filters + detail pane + sidebar nav)
  • Projects system (create/delete projects, view entries, FolderOpen sidebar)
  • File import pipeline (validate + read + STT + polish + save to bf_entries)
  • Settings panel (Ollama status, clipboard toggle, notes dir, Sparkles sidebar)
  • Onboarding adaptation (removed dead auth code, WhisperWoof-themed text, local-first flow)
  • Meeting recording (meeting-bridge.js — session tracking, transcript assembly, bf_entries)
  • Status: complete
  • Depends on: Phase 1a gate passed

Phase 2: MCP Plugin System

  • Implement WhisperWoof as MCP client (@modelcontextprotocol/sdk v1.28.0)
  • Build 3 first-party MCP server plugins (Todoist, Notion, Slack)
  • MCP server discovery + management UI (WhisperWoofPlugins.tsx)
  • MCP plugin permission model (network allowlist, data type filtering, minimal defaults)
  • Projects → dispatch to MCP integrations
  • Status: complete
  • Depends on: Phase 1b complete

Phase 3: Polish & Ship

  • Command bar (Cmd+K) — text alternative for voice (shipped in Phase 2)
  • Performance optimization (virtual scrolling for 10K+ entries)
  • Documentation (CONTRIBUTING.md)
  • UI cleanup (removed Integrations, Support, simplified profile)
  • Smart model advisor (RAM-based recommendations)
  • Polish presets (5 personalities with eval framework)
  • DESIGN.md created with Mando palette
  • First public release (v1.0.0)
  • Status: complete
  • Depends on: Phase 2 complete ✓

Phase 4: Competitive Feature Parity (post-v1.0)

  • Context-aware per-app polish (auto-detect frontmost app → select preset)
  • Voice editing commands (10 commands: rewrite, translate, summarize, fix, shorten, expand, format, simplify)
  • BYOM for LLM polish (Ollama, OpenAI, Anthropic, Groq — provider abstraction)
  • Adaptive learning (few-shot style examples from user edits, injected into polish prompt)
  • Voice snippets (trigger phrases → expand to saved text blocks, exact/prefix/fuzzy matching)
  • Mobile companion (Telegram bot — voice capture on mobile, inbox sync to desktop)
  • Status: complete
  • Depends on: Phase 3 complete ✓
  • Competitors: Wispr Flow, SuperWhisper, Aqua Voice, DictaFlow, VoiceInk, Willow Voice

Phase 5: Power User Features (post-v1.0)

  • Backtrack correction (detect "no wait", "I mean", "scratch that" → resolve self-corrections)
  • Custom vocabulary (categories, alternatives, STT hints, bulk import/export, usage tracking)
  • Voice Activity Detection (RMS energy analysis, auto-stop on silence, audio trimming, speech ratio)
  • Export/import settings (bundle all config into single JSON, merge/replace import, API key stripping)
  • Usage analytics dashboard (entries/day, source breakdown, polish stats, top commands/snippets, streaks, busiest hours)
  • Status: complete
  • Depends on: Phase 4 complete ✓

Phase 6: Internationalization & Advanced (post-v1.0)

  • Multi-language auto-detection (script + word-frequency heuristic, 22 languages, auto-adapt polish prompt)
  • Voice-to-code / vibe coding mode (code intent detection, IDE/terminal auto-switch, code + shell prompts)
  • Intent-based capture (rambling detection with 6 signal categories, 5 output modes: auto/action/decision/question/summary)
  • Real-time streaming partial results (session lifecycle, word diffing, display formatting, WPM tracking)
  • Status: complete
  • Depends on: Phase 5 complete ✓

Phase 7: Unique Differentiators (post-v1.1)

  • Focus mode / voice sprints (timed sessions, entry tracking, completion stats, 5 presets)
  • Entry tagging / labels (SQLite many-to-many, CRUD, filter by tag, bulk operations, color, stats)
  • Privacy lock mode (block all cloud URLs, Ollama-only, disable STT/Telegram/analytics, override system)
  • Keyboard shortcut customization (rebind 12 actions, conflict detection, export/import, 5 categories, reset)
  • Status: complete
  • Depends on: Phase 6 complete ✓

Phase 8: AI Intelligence Layer (post-v1.2)

  • Daily/weekly AI digest (entry aggregation, source breakdown, LLM-generated summary with action items/decisions/topics)
  • Webhook integration (CRUD, source/tag/project filters, HMAC signing, retry with backoff, delivery log, test fire)
  • Smart auto-tagging (10 keyword categories + LLM fallback, existing tag matching, scored suggestions)
  • Entry search by semantic similarity (TF-IDF vectors, cosine similarity, find-similar, zero dependencies)
  • Status: complete
  • Depends on: Phase 7 complete ✓

Phase 9: Structured Capture & Workflows (post-v1.3)

  • Entry templates (5 built-in: standup/meeting/bug/email/update + custom, section-by-section voice fill, Markdown rendering)
  • Smart reply drafting (4 modes: email/slack/comment/general, app-aware mode selection, reply intent detection)
  • Recurring capture (cron-style scheduler, 4 presets, weekday/time config, template+tag linking, dedup)
  • Entry chaining (SQLite parent-child links, tree traversal, cycle detection, branching, chain stats)
  • Status: complete
  • Depends on: Phase 8 complete ✓

Phase 10: Intelligent Voice Interface (post-v1.4)

  • Screen context awareness (read selected text via Accessibility API, 6 commands: summarize/explain/reply/translate/simplify/bullets)
  • Agentic actions (5 action types: calendar/slack/todoist/notion/email, LLM param extraction, MCP routing)
  • Conversation memory (7 query patterns, topic extraction, LLM-powered answers from entry history)
  • Voice-driven app automation (11 commands: open/switch/close/minimize/fullscreen/mute/volume/dark mode/new tab/window, AppleScript)
  • Status: complete
  • Depends on: Phase 9 complete ✓

Phase 12: Meeting Safety + Agent Fixes (v1.9.0)

  • MeetingAudioBuffer — local WAV file buffer with 5-min rotating segments
  • MeetingTranscriptCheckpoint — periodic transcript save to SQLite every 60s
  • MeetingSessionManager — WebSocket reconnection with backoff + session rotation at 25min
  • Wire audio buffer into sendMeetingAudio (parallel write to disk + OpenAI)
  • Wire checkpoint into segment handlers for crash safety
  • Auto-start recording option (meetingAutoStart setting)
  • Unified meeting bridge (checkpoint-backed, not in-memory-only)
  • Persistent notifications (remove 30s auto-dismiss)
  • Calendar pre-notification (~90s before scheduled meetings)
  • Unified notification path (calendar → custom overlay, not native OS notification)
  • Confidence-based thresholds (2s with meeting app, 8s mic-only)
  • Calendar overrides dismiss cooldown
  • Process detection feeds audio detector
  • Fix agent conversation creation race condition (mutex)
  • Add LLM streaming cancellation (AbortController)
  • Fix agent auto-scroll (only when near bottom)
  • Fix stale messagesRef in agent LLM context
  • Fix agentic-actions tests (import from source, not duplicated)
  • Fix stale "Now" indicator in UpcomingMeetings
  • Deduplicate AgentState type
  • Improve empty streaming state UX (loading dots)
  • 78 new meeting tests + 15 new agent tests (744 total, all passing)
  • Status: complete
  • Depends on: Phase 10 complete ✓

Phase 11: Distribution & Code Signing

  • Apple Developer account acquired
  • Enable code signing in electron-builder (removed identity: null)
  • Create Developer ID Application certificate (via developer.apple.com)
  • Set up notarization credentials (Apple ID + app-specific password)
  • Configure CI env vars: CSC_NAME, APPLE_ID, APPLE_APP_SPECIFIC_PASSWORD, APPLE_TEAM_ID
  • Build signed + notarized .dmg locally and verify Gatekeeper passes
  • Update CI/CD to sign + notarize on release builds
  • Auto-update (Sparkle / electron-updater with GitHub Releases)
  • Status: in progress
  • Depends on: Apple Developer account ✓

Phase 13: Engineering Review Cleanup (current)

  • EntryRow memoization + stable onSelect callback (virtual scroll perf)
  • Dev-gate SmartClipboard demo data fallback (import.meta.env.DEV)
  • Batch project-integration IPC to fix N+1 in WhisperWoofProjects
  • Delete unused 636-line SqliteProvider class + correct architecture docs
  • STT config error at boot — "not signed into cloud" was treated as an error, and debugLogger.error("msg:", error) rendered Error objects as {}. Fixed both.
  • Security: fix API key leak in stripApiKeys — filter was String.includes("apiKey") (lowercase) but app stores keys as openaiApiKey (capital A), so every settings export leaked every plaintext key. Caught by the test-truthfulness refactor.
  • Test-truthfulness refactor — 31 of ~35 files done, all buckets complete. smart-clipboard documented as not-feasible (inline SQL needs better-sqlite3). See docs/test-truthfulness-refactor.md.
  • Surgical upstream cherry-picks: brace-expansion + xmldom security bumps, JSON.parse validation in prompts, Gemma 4 local models (E2B/E4B + 31B/26B MoE)
  • Full upstream merge (deferred — 177 commits remaining, heavy overlap on ipcHandlers.js / agent / meeting files; needs its own session)
  • Sweep the 15 remaining debugLogger.error("msg:", error) sites in ipcHandlers.js — all converted to template literals with error.message
  • TypeScript strict-mode errors: 313 → 0 across all whisperwoof files
  • Status: complete (except full upstream merge, deferred)
  • Depends on: Phase 12 complete ✓

Phase 14: Hotkey Fix + Plugin Setup + Obsidian Integration (v1.11.0)

  • CGEventTap rewrite of macos-globe-listener.swift — key consumption for Fn+letter combos
  • Fn+N actually saves markdown (was silently falling through to paste-at-cursor)
  • Fn+P tags entry for project routing (was silently falling through)
  • Fn+letter combos force push-to-talk in toggle mode
  • TickTick plugin added to defaults
  • Guided plugin setup flow (inline setup card with API key input, test, save)
  • Markdown notes with YAML frontmatter (Obsidian compatible)
  • Notes directory configurable via UI (Settings > Notes > Change Folder)
  • Mando head icon in route toasts
  • Toast component icon prop
  • Status: complete
  • Depends on: Phase 13 complete ✓

Key Questions

  1. Ollama latency: Can Llama 3.2 3B polish <1s on M1? (Benchmark in Phase 1a)
  2. Fn key reliability: Does Globe key work on target macOS version? (Validate in Phase 0)
  3. OpenWhispr upstream: Do we maintain merge compatibility or own the fork? De facto own the fork. Surgical cherry-picks for security + additive features (local model registry entries). Last catch-up: 2026-04-11 (5 commits from a 182-commit backlog). Full merge deferred — heavy overlap on ipcHandlers.js and agent/meeting files.
  4. NSPasteboard battery impact: Is 0.5s polling acceptable in Electron? (Profile in Phase 1b)

Decisions Made

Decision Rationale
Fork OpenWhispr (not Tauri rewrite) Fastest path to daily-use. Inherits STT, hotkeys, UI, audio.
Keep Kysely ORM Less migration work, existing OpenWhispr code stays compatible
src/whisperwoof/ isolation Minimizes merge conflicts with upstream. Bridge pattern for OpenWhispr hooks.
Strict TypeScript for WhisperWoof only Type safety for new code without fixing all OpenWhispr type errors
Vitest for testing Integrates with existing Vite config, native ESM/TS support
IPC hardening over sandbox sandbox:true impossible with native modules. Focus on CSP + preload audit.
FileVault for DB, safeStorage for audio FTS5 incompatible with field-level encryption. Audio is biometric → stronger protection.
Cloud STT: keep but proxy through main Enables webSecurity:true while preserving flexibility
Real-time streaming for meetings Chunk-by-chunk STT, discard audio in transcript-only mode
Background processing for file imports Progress bar in history list, user keeps using app
Learning → Expert adaptive feedback First 20 captures show before/after toast, then auto-switch to minimal
Phase 1a/1b gate Don't build features on a broken foundation

Errors Encountered

Error Attempt Resolution
(none yet)

Notes

  • Design doc: docs/design/design-doc.md
  • CEO plan: docs/design/ceo-plan.md
  • Review summary: docs/reviews/2026-03-23-initial-reviews.md
  • OpenWhispr has ZERO tests — testing infrastructure built from scratch
  • OpenWhispr repo: https://github.com/OpenWhispr/openwhispr