Task Plan: WhisperWoof — Voice-First Personal Automation

Goal

Fork OpenWhispr and build WhisperWoof: a voice-first personal automation tool that transcribes, polishes (local LLM), routes (hotkey-driven), and stores (unified capture layer) voice and clipboard input.

Current Phase

v1.9.0 shipped + unreleased eng-review cleanup (758 tests, 13 of 35 test-truthfulness files refactored [Bucket B complete], 5 surgical upstream cherry-picks, STT config error fixed, API key export leak fixed). Next: Bucket C test refactor OR switch to Phase 11 (code signing + notarization) once Apple Developer cert arrives.

Phases

Phase 0: Fork + Audit + Harden

Fork OpenWhispr, build locally, verify app boots
Security audit: 241 IPC methods catalogued, CSP added, webSecurity re-enabled, URL/path validation
Proxy cloud API calls — CSP connect-src allowlist configured
Set up Vitest + write tests (70 tests passing across 4 files)
Rebrand: package.json, electron-builder.json, main.js, windowConfig.js
WhisperWoof core modules built: StorageProvider interface, OllamaService, HotkeyRouter, ClipboardMonitor, Pipeline (runtime SQL lives in bridge/app-init.js; SqliteProvider class was deleted 2026-04-11 as dead code)
Wire WhisperWoof init into main.js (startApp + will-quit)
Validate Fn key — works, timing improved (75ms hold, 100ms cooldown, crash recovery)
Status: complete
Depends on: Nothing — this is the starting point
Done: OpenWhispr merged, rebranded, security hardened, 70 tests pass, app boots

Phase 1a: Core Pipeline (sequential — each depends on previous)

StorageProvider interface + wrap existing Kysely/database.js
Add entries table, projects table, FTS5 index, audit_log table
Rewrite ReasoningService → OllamaService (Ollama HTTP API at localhost:11434)
Save voice transcriptions to bf_entries (dual-write with OpenWhispr)
History query/search/delete API via IPC
Learning mode toast (before/after polish, first 20 captures)
Hotkey routing — via Command Bar (Cmd+K → /todo, /note, /project)
Fn+letter combo routing — globe-listener detects keyDown while Fn held, routes Fn+T→clipboard, Fn+N→markdown, Fn+P→project
GATE: Use WhisperWoof daily for 3 days. Fix issues before proceeding. (Daily-driven through v1.9.0)
Status: complete
Depends on: Phase 0 complete ✓
Gate criteria: Daily-usable. If Ollama latency bad → fix or cut. If Fn key broken → switch default.

Phase 1b: New Features (parallel — start only after 1a gate passes)

ClipboardMonitor (polling every 500ms, dedup, saves to bf_entries)
Floating indicator reskin (dog ear SVG, amber brand, centered, 48px)
Voice-to-Markdown route (Fn+N → .md file to ~/Documents/WhisperWoof Notes/)
History UI (search + filters + detail pane + sidebar nav)
Projects system (create/delete projects, view entries, FolderOpen sidebar)
File import pipeline (validate + read + STT + polish + save to bf_entries)
Settings panel (Ollama status, clipboard toggle, notes dir, Sparkles sidebar)
Onboarding adaptation (removed dead auth code, WhisperWoof-themed text, local-first flow)
Meeting recording (meeting-bridge.js — session tracking, transcript assembly, bf_entries)
Status: complete
Depends on: Phase 1a gate passed

Phase 2: MCP Plugin System

Implement WhisperWoof as MCP client (@modelcontextprotocol/sdk v1.28.0)
Build 3 first-party MCP server plugins (Todoist, Notion, Slack)
MCP server discovery + management UI (WhisperWoofPlugins.tsx)
MCP plugin permission model (network allowlist, data type filtering, minimal defaults)
Projects → dispatch to MCP integrations
Status: complete
Depends on: Phase 1b complete

Phase 3: Polish & Ship

Command bar (Cmd+K) — text alternative for voice (shipped in Phase 2)
Performance optimization (virtual scrolling for 10K+ entries)
Documentation (CONTRIBUTING.md)
UI cleanup (removed Integrations, Support, simplified profile)
Smart model advisor (RAM-based recommendations)
Polish presets (5 personalities with eval framework)
DESIGN.md created with Mando palette
First public release (v1.0.0)
Status: complete
Depends on: Phase 2 complete ✓

Phase 4: Competitive Feature Parity (post-v1.0)

Context-aware per-app polish (auto-detect frontmost app → select preset)
Voice editing commands (10 commands: rewrite, translate, summarize, fix, shorten, expand, format, simplify)
BYOM for LLM polish (Ollama, OpenAI, Anthropic, Groq — provider abstraction)
Adaptive learning (few-shot style examples from user edits, injected into polish prompt)
Voice snippets (trigger phrases → expand to saved text blocks, exact/prefix/fuzzy matching)
Mobile companion (Telegram bot — voice capture on mobile, inbox sync to desktop)
Status: complete
Depends on: Phase 3 complete ✓
Competitors: Wispr Flow, SuperWhisper, Aqua Voice, DictaFlow, VoiceInk, Willow Voice

Phase 5: Power User Features (post-v1.0)

Backtrack correction (detect "no wait", "I mean", "scratch that" → resolve self-corrections)
Custom vocabulary (categories, alternatives, STT hints, bulk import/export, usage tracking)
Voice Activity Detection (RMS energy analysis, auto-stop on silence, audio trimming, speech ratio)
Export/import settings (bundle all config into single JSON, merge/replace import, API key stripping)
Usage analytics dashboard (entries/day, source breakdown, polish stats, top commands/snippets, streaks, busiest hours)
Status: complete
Depends on: Phase 4 complete ✓

Phase 6: Internationalization & Advanced (post-v1.0)

Multi-language auto-detection (script + word-frequency heuristic, 22 languages, auto-adapt polish prompt)
Voice-to-code / vibe coding mode (code intent detection, IDE/terminal auto-switch, code + shell prompts)
Intent-based capture (rambling detection with 6 signal categories, 5 output modes: auto/action/decision/question/summary)
Real-time streaming partial results (session lifecycle, word diffing, display formatting, WPM tracking)
Status: complete
Depends on: Phase 5 complete ✓

Phase 7: Unique Differentiators (post-v1.1)

Focus mode / voice sprints (timed sessions, entry tracking, completion stats, 5 presets)
Entry tagging / labels (SQLite many-to-many, CRUD, filter by tag, bulk operations, color, stats)
Privacy lock mode (block all cloud URLs, Ollama-only, disable STT/Telegram/analytics, override system)
Keyboard shortcut customization (rebind 12 actions, conflict detection, export/import, 5 categories, reset)
Status: complete
Depends on: Phase 6 complete ✓

Phase 8: AI Intelligence Layer (post-v1.2)

Daily/weekly AI digest (entry aggregation, source breakdown, LLM-generated summary with action items/decisions/topics)
Webhook integration (CRUD, source/tag/project filters, HMAC signing, retry with backoff, delivery log, test fire)
Smart auto-tagging (10 keyword categories + LLM fallback, existing tag matching, scored suggestions)
Entry search by semantic similarity (TF-IDF vectors, cosine similarity, find-similar, zero dependencies)
Status: complete
Depends on: Phase 7 complete ✓

Phase 9: Structured Capture & Workflows (post-v1.3)

Entry templates (5 built-in: standup/meeting/bug/email/update + custom, section-by-section voice fill, Markdown rendering)
Smart reply drafting (4 modes: email/slack/comment/general, app-aware mode selection, reply intent detection)
Recurring capture (cron-style scheduler, 4 presets, weekday/time config, template+tag linking, dedup)
Entry chaining (SQLite parent-child links, tree traversal, cycle detection, branching, chain stats)
Status: complete
Depends on: Phase 8 complete ✓

Phase 10: Intelligent Voice Interface (post-v1.4)

Screen context awareness (read selected text via Accessibility API, 6 commands: summarize/explain/reply/translate/simplify/bullets)
Agentic actions (5 action types: calendar/slack/todoist/notion/email, LLM param extraction, MCP routing)
Conversation memory (7 query patterns, topic extraction, LLM-powered answers from entry history)
Voice-driven app automation (11 commands: open/switch/close/minimize/fullscreen/mute/volume/dark mode/new tab/window, AppleScript)
Status: complete
Depends on: Phase 9 complete ✓

Phase 12: Meeting Safety + Agent Fixes (v1.9.0)

Phase 11: Distribution & Code Signing

Apple Developer account acquired
Enable code signing in electron-builder (removed identity: null)
Create Developer ID Application certificate (via developer.apple.com)
Set up notarization credentials (Apple ID + app-specific password)
Configure CI env vars: CSC_NAME, APPLE_ID, APPLE_APP_SPECIFIC_PASSWORD, APPLE_TEAM_ID
Build signed + notarized .dmg locally and verify Gatekeeper passes
Update CI/CD to sign + notarize on release builds
Auto-update (Sparkle / electron-updater with GitHub Releases)
Status: in progress
Depends on: Apple Developer account ✓

Phase 13: Engineering Review Cleanup (current)

Phase 14: Hotkey Fix + Plugin Setup + Obsidian Integration (v1.11.0)

Key Questions

Ollama latency: Can Llama 3.2 3B polish <1s on M1? (Benchmark in Phase 1a)
Fn key reliability: Does Globe key work on target macOS version? (Validate in Phase 0)
~~OpenWhispr upstream: Do we maintain merge compatibility or own the fork?~~ De facto own the fork. Surgical cherry-picks for security + additive features (local model registry entries). Last catch-up: 2026-04-11 (5 commits from a 182-commit backlog). Full merge deferred — heavy overlap on ipcHandlers.js and agent/meeting files.
NSPasteboard battery impact: Is 0.5s polling acceptable in Electron? (Profile in Phase 1b)

Decisions Made

Decision	Rationale
Fork OpenWhispr (not Tauri rewrite)	Fastest path to daily-use. Inherits STT, hotkeys, UI, audio.
Keep Kysely ORM	Less migration work, existing OpenWhispr code stays compatible
src/whisperwoof/ isolation	Minimizes merge conflicts with upstream. Bridge pattern for OpenWhispr hooks.
Strict TypeScript for WhisperWoof only	Type safety for new code without fixing all OpenWhispr type errors
Vitest for testing	Integrates with existing Vite config, native ESM/TS support
IPC hardening over sandbox	sandbox:true impossible with native modules. Focus on CSP + preload audit.
FileVault for DB, safeStorage for audio	FTS5 incompatible with field-level encryption. Audio is biometric → stronger protection.
Cloud STT: keep but proxy through main	Enables webSecurity:true while preserving flexibility
Real-time streaming for meetings	Chunk-by-chunk STT, discard audio in transcript-only mode
Background processing for file imports	Progress bar in history list, user keeps using app
Learning → Expert adaptive feedback	First 20 captures show before/after toast, then auto-switch to minimal
Phase 1a/1b gate	Don't build features on a broken foundation

Errors Encountered

Error	Attempt	Resolution
(none yet)

Notes

Design doc: docs/design/design-doc.md
CEO plan: docs/design/ceo-plan.md
Review summary: docs/reviews/2026-03-23-initial-reviews.md
OpenWhispr has ZERO tests — testing infrastructure built from scratch
OpenWhispr repo: https://github.com/OpenWhispr/openwhispr

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task Plan: WhisperWoof — Voice-First Personal Automation

Goal

Current Phase

Phases

Phase 0: Fork + Audit + Harden

Phase 1a: Core Pipeline (sequential — each depends on previous)

Phase 1b: New Features (parallel — start only after 1a gate passes)

Phase 2: MCP Plugin System

Phase 3: Polish & Ship

Phase 4: Competitive Feature Parity (post-v1.0)

Phase 5: Power User Features (post-v1.0)

Phase 6: Internationalization & Advanced (post-v1.0)

Phase 7: Unique Differentiators (post-v1.1)

Phase 8: AI Intelligence Layer (post-v1.2)

Phase 9: Structured Capture & Workflows (post-v1.3)

Phase 10: Intelligent Voice Interface (post-v1.4)

Phase 12: Meeting Safety + Agent Fixes (v1.9.0)

Phase 11: Distribution & Code Signing

Phase 13: Engineering Review Cleanup (current)

Phase 14: Hotkey Fix + Plugin Setup + Obsidian Integration (v1.11.0)

Key Questions

Decisions Made

Errors Encountered

Notes

FilesExpand file tree

task_plan.md

Latest commit

History

task_plan.md

File metadata and controls

Task Plan: WhisperWoof — Voice-First Personal Automation

Goal

Current Phase

Phases

Phase 0: Fork + Audit + Harden

Phase 1a: Core Pipeline (sequential — each depends on previous)

Phase 1b: New Features (parallel — start only after 1a gate passes)

Phase 2: MCP Plugin System

Phase 3: Polish & Ship

Phase 4: Competitive Feature Parity (post-v1.0)

Phase 5: Power User Features (post-v1.0)

Phase 6: Internationalization & Advanced (post-v1.0)

Phase 7: Unique Differentiators (post-v1.1)

Phase 8: AI Intelligence Layer (post-v1.2)

Phase 9: Structured Capture & Workflows (post-v1.3)

Phase 10: Intelligent Voice Interface (post-v1.4)

Phase 12: Meeting Safety + Agent Fixes (v1.9.0)

Phase 11: Distribution & Code Signing

Phase 13: Engineering Review Cleanup (current)

Phase 14: Hotkey Fix + Plugin Setup + Obsidian Integration (v1.11.0)

Key Questions

Decisions Made

Errors Encountered

Notes