Skip to content

jatinkrmalik/vocamac

Repository files navigation

VocaMac

VocaMac

Your voice, your Mac, your privacy. Open-source dictation powered by AI.

Build & Test License: AGPL-3.0 Platform: macOS Swift 5.9+ Release Nightly

Powered by WhisperKit Apple Silicon Privacy Works Offline

PRs Welcome GitHub Issues GitHub Stars Twitter Follow

Speak. It types. 100% offline, open-source voice-to-text for macOS - powered by WhisperKit. No cloud, no subscriptions, no data leaves your device. Just hold a hotkey, speak, and your words appear wherever your cursor is.


✨ Features

  • 🔒 100% Local - All audio processing happens on your machine. No internet required — the Tiny model ships bundled and works out of the box offline.
  • ⌨️ System-Wide Text Injection - Transcribed text is typed wherever your cursor is: browsers, Slack, VS Code, spreadsheets, terminals - everywhere.
  • 🎯 Push-to-Talk - Hold a hotkey (default: Right Option) to record. Release to transcribe.
  • 👆 Double-Tap Toggle - Double-tap the hotkey to start/stop recording.
  • 🧠 Smart Model Selection - Auto-detects your hardware (Apple Silicon/Intel, RAM) and recommends the best whisper model via WhisperKit.
  • ⚡ Native Apple Acceleration - CoreML + Metal + Neural Engine acceleration on Apple Silicon. No manual setup.
  • 📊 Visual Feedback - Menu bar icon changes color during recording and processing. Audio level indicator shows input.
  • 🔄 Auto-Updates - Built-in update checker queries GitHub Releases on launch and lets you download and install the latest version in one click from within the app.
  • ⚙️ Configurable - Choose hotkeys, models, languages, silence detection thresholds, and more.

📸 Screenshots

VocaMac Popover
Menu bar popover with status and controls

Menu Bar - Idle    Menu Bar - Recording
Menu bar icon: idle (left) and recording (right)

Settings - General    Settings - Models
Settings: General tab (left) and Models tab with resource monitoring (right)

Settings - Audio    Settings - About
Settings: Audio tab (left) and About tab (right)

Cursor Indicator
Floating mic indicator near text cursor during recording


🏛️ Why WhisperKit?

VocaMac uses WhisperKit instead of raw whisper.cpp because:

WhisperKit whisper.cpp
Language Pure Swift (native) C++ (requires bridging)
Apple Silicon CoreML + Neural Engine Metal only
SPM Integration One-line dependency Complex vendoring
Model Format CoreML (optimized per device) GGML (generic)
Streaming First-class async/await Manual threading
Quality Same OpenAI Whisper models Same OpenAI Whisper models
Maintenance Argmax Inc. (commercial) Community

Same accuracy, dramatically better Apple platform integration.


📋 Requirements

  • macOS 13 (Ventura) or later
  • Apple Silicon (M1/M2/M3/M4)
  • Xcode 15+ or Swift 5.9+ (only for building from source)

Permissions

VocaMac requires three macOS permissions:

Permission Why
Microphone Capture your voice for transcription
Accessibility Global hotkeys and text injection into apps
Input Monitoring Detect hotkey presses system-wide

Note: After granting Input Monitoring, a restart of VocaMac is required for it to take effect.


🚀 Quick Start

Option 1: Download DMG (Recommended)

  1. Download the latest VocaMac-x.x.x-arm64.dmg from the Releases page
  2. Open the DMG and drag VocaMac to Applications
  3. Open VocaMac from Applications
  4. Grant permissions: Microphone, Accessibility, and Input Monitoring when prompted

VocaMac is Developer ID signed and notarized by Apple — macOS will open it without any security warnings.

Option 2: Build from Source (Recommended)

git clone https://github.com/jatinkrmalik/vocamac.git
cd vocamac
make install

This builds VocaMac, installs it to /Applications, and launches it. Permissions are granted directly to VocaMac, just like the DMG method.

Option 3: CLI Commands (For Developers)

git clone https://github.com/jatinkrmalik/vocamac.git
cd vocamac
make install-cli

This installs two commands to ~/.local/bin:

  • vocamac &: Launch VocaMac in background
  • vocamac-build: Rebuild from source after pulling updates

Permissions note: In CLI mode, macOS assigns permissions to your terminal app (Terminal, iTerm2, etc.) rather than VocaMac itself. Grant Microphone, Accessibility, and Input Monitoring to your terminal app instead.

First Launch

  1. VocaMac appears in your menu bar (microphone icon, no Dock icon)
  2. Grant permissions: Microphone, Accessibility, and Input Monitoring (see Permissions above)
  3. First model download: WhisperKit automatically downloads the recommended model for your device (~40–500 MB depending on hardware)
  4. Start dictating: Hold the Right Option key, speak, and release. Your words appear at the cursor!

🌙 Nightly Builds

Nightly builds are automated builds from the latest main branch, published every day at midnight UTC when there are new commits. They let you try the latest features, fixes, and improvements before they land in a stable release.

Why use a nightly build?

  • Early access — Test new features days or weeks before the next stable release
  • Help improve VocaMac — Your feedback on nightly builds catches bugs before they reach everyone
  • Fully signed & notarized — Nightly builds are Developer ID signed and notarized by Apple, just like stable releases. No Gatekeeper warnings, no right-click workarounds

How to install:

  1. Download the latest VocaMac-nightly-*.dmg from the Nightly Release
  2. Open the DMG and drag VocaMac to Applications
  3. Grant permissions when prompted (same as a stable release)

How to identify your build:

Nightly builds embed the date and commit SHA in the version string. Open Settings → About to see something like:

Version 0.5.0-nightly.20260414+abc1234 (Nightly)

This helps us pinpoint the exact code you're running if you report an issue.

Cadence & stability:

Stable Release Nightly Build
Frequency When ready (manual tag) Daily at midnight UTC
Source Tagged commit Latest main branch
Signed & notarized ✅ Yes ✅ Yes
Stability Production-ready May contain incomplete features or bugs
Best for Daily use Testing & early feedback

⚠️ Nightly builds may be unstable. If you encounter issues, please open a bug report — your feedback helps us ship better stable releases!


🎮 Usage

Push-to-Talk (Default)

Action What Happens
Hold Right Option Recording starts (menu bar icon turns red)
Speak Audio is captured locally
Release Right Option Recording stops → transcription → text injected at cursor

Double-Tap Toggle

Action What Happens
Double-tap Right Option Recording starts
Speak Audio is captured
Double-tap Right Option again Recording stops → transcription → text injection

Switch between modes in Settings → General → Activation.


🧠 Whisper Models

VocaMac uses OpenAI Whisper models via WhisperKit's CoreML format. The app auto-detects your hardware and recommends the best model:

Model Parameters Size Speed Quality Best For
Tiny 39M ~0.4 GB ⚡⚡⚡⚡⚡ Good Quick notes, older Macs
Base 74M ~0.8 GB ⚡⚡⚡⚡ Better Daily use on 8GB Macs
Small 244M ~1.5 GB ⚡⚡⚡ Great 16GB+ Apple Silicon
Medium 769M ~2.5 GB ⚡⚡ Excellent 24GB+ for high accuracy
Large v3 1550M ~4.8 GB Best Maximum accuracy

Models are downloaded automatically from HuggingFace on first use and cached locally. Download additional models from Settings → Models.


⚙️ Configuration

Open Settings from the menu bar popover or with ⌘,

General

  • Activation mode - Push-to-Talk or Double-Tap Toggle
  • Hotkey - Choose from Right Option, Right Command, Fn, function keys, etc.
  • Language - Auto-detect or specify (English, Spanish, French, German, Chinese, Japanese, and more)
  • Launch at login

Audio

  • Max recording duration - 30s, 60s, 120s, or 300s
  • Silence detection - Auto-stop recording after configurable silence
  • Sound effects - Toggle audio feedback for recording start/stop
  • Input device - Select which microphone to use

Models

  • View system info and WhisperKit's hardware recommendation
  • Download, load, and switch between models
  • See which models are supported on your device

🏗️ Architecture

VocaMac is built with a clean, modular architecture using native Swift and SwiftUI:

VocaMacApp (SwiftUI MenuBarExtra)
├── AppState          - Central observable state
├── HotKeyManager     - CGEventTap global hotkey listener
├── AudioEngine       - AVAudioEngine mic capture (16kHz, mono, Float32)
├── WhisperService    - WhisperKit async transcription wrapper
│   └── ModelManager  - Model download, storage, device recommendations
│       └── SystemInfo - Hardware detection & model recommendation
├── SoundManager      - Audio feedback (start/stop recording cues)
├── TextInjector      - Clipboard + Cmd+V text injection
├── MenuBarView       - Status popover UI
└── SettingsView      - Configuration tabs (General, Models, Audio, Debug, About)

For detailed documentation, see:


🔧 Development

Prerequisites

  • Xcode 15+ or Swift 5.9+ toolchain
  • macOS 13+

Project Structure

VocaMac/
├── Package.swift                   # SPM config (WhisperKit dependency)
├── Sources/
│   └── VocaMac/
│       ├── App/
│       │   └── VocaMacApp.swift    # Entry point, MenuBarExtra
│       ├── Views/
│       │   ├── MenuBarView.swift   # Menu bar popover
│       │   └── SettingsView.swift  # Settings window (5 tabs)
│       ├── Services/
│       │   ├── AudioEngine.swift   # AVAudioEngine mic capture
│       │   ├── HotKeyManager.swift # CGEventTap global hotkeys
│       │   ├── WhisperService.swift# WhisperKit transcription wrapper
│       │   ├── ModelManager.swift  # Model download & management
│       │   ├── SoundManager.swift  # Audio feedback for recording
│       │   ├── TextInjector.swift  # Clipboard-based text injection
│       │   └── SystemInfo.swift    # Hardware detection
│       ├── Models/
│       │   ├── AppState.swift      # Central observable state
│       │   ├── TranscriptionResult.swift  # VocaTranscription type
│       │   └── WhisperModel.swift  # ModelSize enum, WhisperModelInfo
│       └── Resources/
├── Tests/
│   └── VocaMacTests/
├── Makefile                        # make build, install, test, clean
├── scripts/
│   ├── build.sh                    # Build .app bundle (dev)
│   ├── install.sh                  # Install to /Applications or CLI
│   └── uninstall.sh                # Full uninstall & cleanup
├── web/                            # Marketing website (vocamac.com)
├── docs/
│   ├── ARCHITECTURE.md             # Technical Architecture
│   └── DATA_MODEL.md               # Data Model & Entity Relationships
├── LICENSE                         # AGPL-3.0 License
└── .gitignore

Build Commands

make install        # Build + install to /Applications (recommended)
make install-cli    # Install CLI commands to ~/.local/bin
make build          # Build .app bundle in repo root (dev iteration)
make test           # Run tests
make run            # Launch the locally built .app
make clean          # Remove build artifacts
make help           # Show all commands

Uninstall

To completely remove VocaMac and all its data (downloaded models, preferences, caches):

./scripts/uninstall.sh

Use --keep-build to preserve build artifacts:

./scripts/uninstall.sh --keep-build

Troubleshooting

Reset onboarding: To re-trigger the first-launch onboarding wizard (e.g., after an upgrade or for testing), reset the onboarding flag:

defaults delete com.vocamac.app vocamac.hasCompletedOnboarding

Then relaunch VocaMac. This only clears the onboarding state; all other preferences (hotkey, language, model) are preserved.

Reset all preferences: To start completely fresh:

defaults delete com.vocamac.app

Reset permissions (troubleshooting): If permissions appear stuck or aren't being recognized after an update, you can reset them from Settings → Debug → Reset All Permissions, or manually via Terminal:

tccutil reset All com.vocamac.app

This clears all permission entries (Microphone, Accessibility, Input Monitoring) for VocaMac. On next launch, macOS will prompt you to re-grant them. With Developer ID signing, permissions normally persist across updates — this reset is only needed for troubleshooting.


🌐 Cross-Platform

VocaMac is the macOS member of the Voca family:

Platform Project Status
Linux VocaLinux ✅ Available
macOS VocaMac 🚀 Beta
🪟 Windows VocaWin 📋 Planned

Each platform uses native technologies for the best possible integration, while sharing the same UX patterns and Whisper model family.


🤝 Related Projects


⚠️ Known Limitations

  • Larger models require a one-time download: VocaMac ships with the Whisper Tiny model bundled — you can dictate immediately with no internet connection. Switching to a larger model (Small, Medium, Large) requires a one-time download; all subsequent launches work fully offline.
  • macOS only: Requires macOS 13 (Ventura) or later.
  • Permissions reset on rebuild (build-from-source only): When building from source without a Developer ID certificate, macOS resets Accessibility and Input Monitoring permissions on every rebuild due to ad-hoc signing. Release builds are Developer ID signed so permissions persist across updates.

Permissions and Code Signing

Release builds of VocaMac are Developer ID signed and notarized by Apple. Accessibility and Input Monitoring permissions persist across updates — no manual re-granting required.

For developers building from source: If you don't have a Developer ID certificate, build.sh falls back to ad-hoc signing. With ad-hoc signing, macOS resets Accessibility and Input Monitoring permissions on every rebuild because the CDHash changes. This is standard macOS security behavior — all open-source apps with Accessibility (Rectangle, Maccy, AltTab, etc.) have the same limitation when ad-hoc signed.

Workarounds for ad-hoc builds:

Approach How Permissions Persist
Run from Terminal Grant permissions to Terminal.app once, then run make run ✅ Always
Re-grant manually System Settings → Privacy & Security after each rebuild Per rebuild

💡 Developer tip: Add your Terminal app (Terminal.app or iTerm2) to both Accessibility and Input Monitoring in System Settings. Then run VocaMac directly from Terminal. Permissions are inherited and never reset.


📄 License

AGPL-3.0 License - see LICENSE for details.


Made with ❤️ for the macOS community!

About

Open-source, offline voice-to-text for macOS. Hold a hotkey, speak, text appears. 100% private, powered by WhisperKit

Topics

Resources

License

Stars

Watchers

Forks

Contributors