Fork OpenWhispr and build WhisperWoof: a voice-first personal automation tool that transcribes, polishes (local LLM), routes (hotkey-driven), and stores (unified capture layer) voice and clipboard input.
v1.9.0 shipped + unreleased eng-review cleanup (758 tests, 13 of 35 test-truthfulness files refactored [Bucket B complete], 5 surgical upstream cherry-picks, STT config error fixed, API key export leak fixed). Next: Bucket C test refactor OR switch to Phase 11 (code signing + notarization) once Apple Developer cert arrives.
- Fork OpenWhispr, build locally, verify app boots
- Security audit: 241 IPC methods catalogued, CSP added, webSecurity re-enabled, URL/path validation
- Proxy cloud API calls — CSP connect-src allowlist configured
- Set up Vitest + write tests (70 tests passing across 4 files)
- Rebrand: package.json, electron-builder.json, main.js, windowConfig.js
- WhisperWoof core modules built: StorageProvider interface, OllamaService, HotkeyRouter, ClipboardMonitor, Pipeline (runtime SQL lives in bridge/app-init.js;
SqliteProviderclass was deleted 2026-04-11 as dead code) - Wire WhisperWoof init into main.js (startApp + will-quit)
- Validate Fn key — works, timing improved (75ms hold, 100ms cooldown, crash recovery)
- Status: complete
- Depends on: Nothing — this is the starting point
- Done: OpenWhispr merged, rebranded, security hardened, 70 tests pass, app boots
- StorageProvider interface + wrap existing Kysely/database.js
- Add
entriestable,projectstable, FTS5 index,audit_logtable - Rewrite ReasoningService → OllamaService (Ollama HTTP API at localhost:11434)
- Save voice transcriptions to bf_entries (dual-write with OpenWhispr)
- History query/search/delete API via IPC
- Learning mode toast (before/after polish, first 20 captures)
- Hotkey routing — via Command Bar (Cmd+K → /todo, /note, /project)
- Fn+letter combo routing — globe-listener detects keyDown while Fn held, routes Fn+T→clipboard, Fn+N→markdown, Fn+P→project
- GATE: Use WhisperWoof daily for 3 days. Fix issues before proceeding. (Daily-driven through v1.9.0)
- Status: complete
- Depends on: Phase 0 complete ✓
- Gate criteria: Daily-usable. If Ollama latency bad → fix or cut. If Fn key broken → switch default.
- ClipboardMonitor (polling every 500ms, dedup, saves to bf_entries)
- Floating indicator reskin (dog ear SVG, amber brand, centered, 48px)
- Voice-to-Markdown route (Fn+N → .md file to ~/Documents/WhisperWoof Notes/)
- History UI (search + filters + detail pane + sidebar nav)
- Projects system (create/delete projects, view entries, FolderOpen sidebar)
- File import pipeline (validate + read + STT + polish + save to bf_entries)
- Settings panel (Ollama status, clipboard toggle, notes dir, Sparkles sidebar)
- Onboarding adaptation (removed dead auth code, WhisperWoof-themed text, local-first flow)
- Meeting recording (meeting-bridge.js — session tracking, transcript assembly, bf_entries)
- Status: complete
- Depends on: Phase 1a gate passed
- Implement WhisperWoof as MCP client (@modelcontextprotocol/sdk v1.28.0)
- Build 3 first-party MCP server plugins (Todoist, Notion, Slack)
- MCP server discovery + management UI (WhisperWoofPlugins.tsx)
- MCP plugin permission model (network allowlist, data type filtering, minimal defaults)
- Projects → dispatch to MCP integrations
- Status: complete
- Depends on: Phase 1b complete
- Command bar (Cmd+K) — text alternative for voice (shipped in Phase 2)
- Performance optimization (virtual scrolling for 10K+ entries)
- Documentation (CONTRIBUTING.md)
- UI cleanup (removed Integrations, Support, simplified profile)
- Smart model advisor (RAM-based recommendations)
- Polish presets (5 personalities with eval framework)
- DESIGN.md created with Mando palette
- First public release (v1.0.0)
- Status: complete
- Depends on: Phase 2 complete ✓
- Context-aware per-app polish (auto-detect frontmost app → select preset)
- Voice editing commands (10 commands: rewrite, translate, summarize, fix, shorten, expand, format, simplify)
- BYOM for LLM polish (Ollama, OpenAI, Anthropic, Groq — provider abstraction)
- Adaptive learning (few-shot style examples from user edits, injected into polish prompt)
- Voice snippets (trigger phrases → expand to saved text blocks, exact/prefix/fuzzy matching)
- Mobile companion (Telegram bot — voice capture on mobile, inbox sync to desktop)
- Status: complete
- Depends on: Phase 3 complete ✓
- Competitors: Wispr Flow, SuperWhisper, Aqua Voice, DictaFlow, VoiceInk, Willow Voice
- Backtrack correction (detect "no wait", "I mean", "scratch that" → resolve self-corrections)
- Custom vocabulary (categories, alternatives, STT hints, bulk import/export, usage tracking)
- Voice Activity Detection (RMS energy analysis, auto-stop on silence, audio trimming, speech ratio)
- Export/import settings (bundle all config into single JSON, merge/replace import, API key stripping)
- Usage analytics dashboard (entries/day, source breakdown, polish stats, top commands/snippets, streaks, busiest hours)
- Status: complete
- Depends on: Phase 4 complete ✓
- Multi-language auto-detection (script + word-frequency heuristic, 22 languages, auto-adapt polish prompt)
- Voice-to-code / vibe coding mode (code intent detection, IDE/terminal auto-switch, code + shell prompts)
- Intent-based capture (rambling detection with 6 signal categories, 5 output modes: auto/action/decision/question/summary)
- Real-time streaming partial results (session lifecycle, word diffing, display formatting, WPM tracking)
- Status: complete
- Depends on: Phase 5 complete ✓
- Focus mode / voice sprints (timed sessions, entry tracking, completion stats, 5 presets)
- Entry tagging / labels (SQLite many-to-many, CRUD, filter by tag, bulk operations, color, stats)
- Privacy lock mode (block all cloud URLs, Ollama-only, disable STT/Telegram/analytics, override system)
- Keyboard shortcut customization (rebind 12 actions, conflict detection, export/import, 5 categories, reset)
- Status: complete
- Depends on: Phase 6 complete ✓
- Daily/weekly AI digest (entry aggregation, source breakdown, LLM-generated summary with action items/decisions/topics)
- Webhook integration (CRUD, source/tag/project filters, HMAC signing, retry with backoff, delivery log, test fire)
- Smart auto-tagging (10 keyword categories + LLM fallback, existing tag matching, scored suggestions)
- Entry search by semantic similarity (TF-IDF vectors, cosine similarity, find-similar, zero dependencies)
- Status: complete
- Depends on: Phase 7 complete ✓
- Entry templates (5 built-in: standup/meeting/bug/email/update + custom, section-by-section voice fill, Markdown rendering)
- Smart reply drafting (4 modes: email/slack/comment/general, app-aware mode selection, reply intent detection)
- Recurring capture (cron-style scheduler, 4 presets, weekday/time config, template+tag linking, dedup)
- Entry chaining (SQLite parent-child links, tree traversal, cycle detection, branching, chain stats)
- Status: complete
- Depends on: Phase 8 complete ✓
- Screen context awareness (read selected text via Accessibility API, 6 commands: summarize/explain/reply/translate/simplify/bullets)
- Agentic actions (5 action types: calendar/slack/todoist/notion/email, LLM param extraction, MCP routing)
- Conversation memory (7 query patterns, topic extraction, LLM-powered answers from entry history)
- Voice-driven app automation (11 commands: open/switch/close/minimize/fullscreen/mute/volume/dark mode/new tab/window, AppleScript)
- Status: complete
- Depends on: Phase 9 complete ✓
- MeetingAudioBuffer — local WAV file buffer with 5-min rotating segments
- MeetingTranscriptCheckpoint — periodic transcript save to SQLite every 60s
- MeetingSessionManager — WebSocket reconnection with backoff + session rotation at 25min
- Wire audio buffer into sendMeetingAudio (parallel write to disk + OpenAI)
- Wire checkpoint into segment handlers for crash safety
- Auto-start recording option (meetingAutoStart setting)
- Unified meeting bridge (checkpoint-backed, not in-memory-only)
- Persistent notifications (remove 30s auto-dismiss)
- Calendar pre-notification (~90s before scheduled meetings)
- Unified notification path (calendar → custom overlay, not native OS notification)
- Confidence-based thresholds (2s with meeting app, 8s mic-only)
- Calendar overrides dismiss cooldown
- Process detection feeds audio detector
- Fix agent conversation creation race condition (mutex)
- Add LLM streaming cancellation (AbortController)
- Fix agent auto-scroll (only when near bottom)
- Fix stale messagesRef in agent LLM context
- Fix agentic-actions tests (import from source, not duplicated)
- Fix stale "Now" indicator in UpcomingMeetings
- Deduplicate AgentState type
- Improve empty streaming state UX (loading dots)
- 78 new meeting tests + 15 new agent tests (744 total, all passing)
- Status: complete
- Depends on: Phase 10 complete ✓
- Apple Developer account acquired
- Enable code signing in electron-builder (removed
identity: null) - Create Developer ID Application certificate (via developer.apple.com)
- Set up notarization credentials (Apple ID + app-specific password)
- Configure CI env vars:
CSC_NAME,APPLE_ID,APPLE_APP_SPECIFIC_PASSWORD,APPLE_TEAM_ID - Build signed + notarized .dmg locally and verify Gatekeeper passes
- Update CI/CD to sign + notarize on release builds
- Auto-update (Sparkle / electron-updater with GitHub Releases)
- Status: in progress
- Depends on: Apple Developer account ✓
- EntryRow memoization + stable onSelect callback (virtual scroll perf)
- Dev-gate SmartClipboard demo data fallback (
import.meta.env.DEV) - Batch project-integration IPC to fix N+1 in WhisperWoofProjects
- Delete unused 636-line
SqliteProviderclass + correct architecture docs - STT config error at boot — "not signed into cloud" was treated as an error, and
debugLogger.error("msg:", error)rendered Error objects as{}. Fixed both. - Security: fix API key leak in
stripApiKeys— filter wasString.includes("apiKey")(lowercase) but app stores keys asopenaiApiKey(capitalA), so every settings export leaked every plaintext key. Caught by the test-truthfulness refactor. - Test-truthfulness refactor — 31 of ~35 files done, all buckets complete.
smart-clipboarddocumented as not-feasible (inline SQL needsbetter-sqlite3). Seedocs/test-truthfulness-refactor.md. - Surgical upstream cherry-picks: brace-expansion + xmldom security bumps, JSON.parse validation in prompts, Gemma 4 local models (E2B/E4B + 31B/26B MoE)
- Full upstream merge (deferred — 177 commits remaining, heavy overlap on ipcHandlers.js / agent / meeting files; needs its own session)
- Sweep the 15 remaining
debugLogger.error("msg:", error)sites inipcHandlers.js— all converted to template literals witherror.message - TypeScript strict-mode errors: 313 → 0 across all whisperwoof files
- Status: complete (except full upstream merge, deferred)
- Depends on: Phase 12 complete ✓
- CGEventTap rewrite of
macos-globe-listener.swift— key consumption for Fn+letter combos - Fn+N actually saves markdown (was silently falling through to paste-at-cursor)
- Fn+P tags entry for project routing (was silently falling through)
- Fn+letter combos force push-to-talk in toggle mode
- TickTick plugin added to defaults
- Guided plugin setup flow (inline setup card with API key input, test, save)
- Markdown notes with YAML frontmatter (Obsidian compatible)
- Notes directory configurable via UI (Settings > Notes > Change Folder)
- Mando head icon in route toasts
- Toast component
iconprop - Status: complete
- Depends on: Phase 13 complete ✓
- Ollama latency: Can Llama 3.2 3B polish <1s on M1? (Benchmark in Phase 1a)
- Fn key reliability: Does Globe key work on target macOS version? (Validate in Phase 0)
OpenWhispr upstream: Do we maintain merge compatibility or own the fork?De facto own the fork. Surgical cherry-picks for security + additive features (local model registry entries). Last catch-up: 2026-04-11 (5 commits from a 182-commit backlog). Full merge deferred — heavy overlap on ipcHandlers.js and agent/meeting files.- NSPasteboard battery impact: Is 0.5s polling acceptable in Electron? (Profile in Phase 1b)
| Decision | Rationale |
|---|---|
| Fork OpenWhispr (not Tauri rewrite) | Fastest path to daily-use. Inherits STT, hotkeys, UI, audio. |
| Keep Kysely ORM | Less migration work, existing OpenWhispr code stays compatible |
| src/whisperwoof/ isolation | Minimizes merge conflicts with upstream. Bridge pattern for OpenWhispr hooks. |
| Strict TypeScript for WhisperWoof only | Type safety for new code without fixing all OpenWhispr type errors |
| Vitest for testing | Integrates with existing Vite config, native ESM/TS support |
| IPC hardening over sandbox | sandbox:true impossible with native modules. Focus on CSP + preload audit. |
| FileVault for DB, safeStorage for audio | FTS5 incompatible with field-level encryption. Audio is biometric → stronger protection. |
| Cloud STT: keep but proxy through main | Enables webSecurity:true while preserving flexibility |
| Real-time streaming for meetings | Chunk-by-chunk STT, discard audio in transcript-only mode |
| Background processing for file imports | Progress bar in history list, user keeps using app |
| Learning → Expert adaptive feedback | First 20 captures show before/after toast, then auto-switch to minimal |
| Phase 1a/1b gate | Don't build features on a broken foundation |
| Error | Attempt | Resolution |
|---|---|---|
| (none yet) |
- Design doc:
docs/design/design-doc.md - CEO plan:
docs/design/ceo-plan.md - Review summary:
docs/reviews/2026-03-23-initial-reviews.md - OpenWhispr has ZERO tests — testing infrastructure built from scratch
- OpenWhispr repo: https://github.com/OpenWhispr/openwhispr