OpenWhispr

An open source desktop dictation application that converts speech to text using OpenAI Whisper. Features both local and cloud processing options for maximum flexibility and privacy.

Star History

License

This project is licensed under the MIT License - see the LICENSE file for details. This means you can freely use, modify, and distribute this software for personal or commercial purposes.

Features

🎤 Global Hotkey: Customizable hotkey to start/stop dictation from anywhere (default: backtick `)
🤖 Multi-Provider AI Processing: Choose between OpenAI, Anthropic Claude, Google Gemini, or local models
🎯 Agent Naming: Personalize your AI assistant with a custom name for natural interactions
🧠 Multi-Provider AI:
- OpenAI: GPT-5, GPT-4.1, o-series reasoning models
- Anthropic: Claude Opus 4.5, Claude Sonnet 4.5
- Google: Gemini 2.5 Pro/Flash/Flash-Lite
- Groq: Ultra-fast inference with Llama and Mixtral models
- Local: Qwen, LLaMA, Mistral models via llama.cpp
🔒 Privacy-First: Local processing keeps your voice data completely private
🎨 Modern UI: Built with React 19, TypeScript, and Tailwind CSS v4
🚀 Fast: Optimized with Vite and modern tooling
📱 Control Panel: Manage settings, view history, and configure API keys
🗄️ Transcription History: SQLite database stores all your transcriptions locally
🔧 Model Management: Download and manage local Whisper models (tiny, base, small, medium, large, turbo)
🧹 Model Cleanup: One-click removal of cached Whisper models with uninstall hooks to keep disks tidy
🌐 Cross-Platform: Works on macOS, Windows, and Linux
⚡ Automatic Pasting: Transcribed text automatically pastes at your cursor location
🖱️ Draggable Interface: Move the dictation panel anywhere on your screen
🔄 OpenAI Responses API: Using the latest Responses API for improved performance
🌐 Globe Key Toggle (macOS): Optional Fn/Globe key listener for a hardware-level dictation trigger
⌨️ Compound Hotkeys: Support for multi-key combinations like Cmd+Shift+K

Prerequisites

Node.js 18+ and npm (Download from nodejs.org)
macOS 10.15+, Windows 10+, or Linux
On macOS, Globe key support requires the Xcode Command Line Tools (xcode-select --install) so the bundled Swift helper can run

Quick Start

For Personal Use (Recommended)

Clone the repository:

git clone https://github.com/HeroTools/open-whispr.git
cd open-whispr

Install dependencies:
```
npm install
```
Optional: Set up API keys (only needed for cloud processing):

Method A - Environment file:
```
cp env.example .env
# Edit .env and add your API keys:
# OPENAI_API_KEY=your_openai_key
# ANTHROPIC_API_KEY=your_anthropic_key  
# GEMINI_API_KEY=your_gemini_key
```
Method B - In-app configuration:
- Run the app and configure API keys through the Control Panel
- Keys are automatically saved and persist across app restarts

Run the application:

npm run dev  # Development mode with hot reload
# OR
npm start    # Production mode

Optional: Local Whisper from source (only needed if you want local processing):
```
npm run download:whisper-cpp
```
This downloads the whisper.cpp binary for your current platform into resources/bin/.

Building for Personal Use (Optional)

If you want to build a standalone app for personal use:

# Build without code signing (no certificates required)
npm run pack

# The unsigned app will be in: dist/mac-arm64/OpenWhispr.app (macOS)
# or dist/win-unpacked/OpenWhispr.exe (Windows)
# or dist/linux-unpacked/open-whispr (Linux)

Note: On macOS, you may see a security warning when first opening the unsigned app. Right-click and select "Open" to bypass this.

Linux (Multiple Package Formats)

OpenWhispr now supports multiple Linux package formats for maximum compatibility:

Available Formats:

.deb - Debian, Ubuntu, Linux Mint, Pop!_OS
.rpm - Fedora, Red Hat, CentOS, openSUSE
.tar.gz - Universal archive (works on any distro)
.flatpak - Sandboxed cross-distro package
AppImage - Portable single-file executable

Building Linux Packages:

# Build default Linux package formats (AppImage, deb, rpm, tar.gz)
npm run build:linux

# Find packages in dist/:
# - OpenWhispr-x.x.x-linux-x64.AppImage
# - OpenWhispr-x.x.x-linux-x64.deb
# - OpenWhispr-x.x.x-linux-x64.rpm
# - OpenWhispr-x.x.x-linux-x64.tar.gz

Optional: Building Flatpak (requires additional setup):

# Install Flatpak build tools
sudo apt install flatpak flatpak-builder  # Debian/Ubuntu
# OR
sudo dnf install flatpak flatpak-builder  # Fedora/RHEL

# Add Flathub repository and install runtime
flatpak remote-add --user --if-not-exists flathub https://flathub.org/repo/flathub.flatpakrepo
flatpak install --user -y flathub org.freedesktop.Platform//24.08 org.freedesktop.Sdk//24.08

# Add "flatpak" to linux.target in electron-builder.json, then build
npm run build:linux

Installation Examples:

# Debian/Ubuntu
sudo apt install ./dist/OpenWhispr-*-linux-x64.deb

# Fedora/RHEL
sudo dnf install ./dist/OpenWhispr-*-linux-x64.rpm

# Universal tar.gz (no root required)
tar -xzf dist/OpenWhispr-*-linux-x64.tar.gz
cd OpenWhispr-*/
./openwhispr

# Flatpak
flatpak install --user ./dist/OpenWhispr-*-linux-x64.flatpak

# AppImage (existing method)
chmod +x dist/OpenWhispr-*.AppImage
./dist/OpenWhispr-*.AppImage

Optional Dependencies for Automatic Paste:

The clipboard paste feature requires platform-specific tools:

X11 (Traditional Linux Desktop):

# Debian/Ubuntu
sudo apt install xdotool

# Fedora/RHEL
sudo dnf install xdotool

# Arch
sudo pacman -S xdotool

Wayland (Modern Linux Desktop):

Choose one of the following paste tools:

Option 1: wtype (requires virtual keyboard protocol support)

# Debian/Ubuntu
sudo apt install wtype

# Fedora/RHEL
sudo dnf install wtype

# Arch
sudo pacman -S wtype

Option 2: ydotool (works on more compositors, requires daemon)

# Debian/Ubuntu
sudo apt install ydotool
sudo systemctl enable --now ydotoold

# Fedora/RHEL
sudo dnf install ydotool
sudo systemctl enable --now ydotoold

# Arch
sudo pacman -S ydotool
sudo systemctl enable --now ydotoold

Terminal Detection (Optional - for KDE Wayland users):

# On KDE Wayland, kdotool enables automatic terminal detection
# to paste with Ctrl+Shift+V instead of Ctrl+V
sudo apt install kdotool  # Debian/Ubuntu
sudo dnf install kdotool  # Fedora/RHEL
sudo pacman -S kdotool    # Arch

ℹ️ Note: OpenWhispr automatically tries paste tools in this order: wtype → ydotool → xdotool (for XWayland apps). If no paste tool is installed, text will still be copied to the clipboard - you'll just need to paste manually with Ctrl+V.

⚠️ ydotool Requirements: The ydotoold daemon must be running for ydotool to work. Start it manually with sudo ydotoold & or enable the systemd service as shown above.

🔒 Flatpak Security: The Flatpak package includes sandboxing with explicit permissions for microphone, clipboard, and file access. See electron-builder.json for the complete permission list.

Building for Distribution

For maintainers who need to distribute signed builds:

# Requires code signing certificates and notarization setup
npm run build:mac    # macOS (requires Apple Developer account)
npm run build:win    # Windows (requires code signing cert)
npm run build:linux  # Linux

First Time Setup

Choose Processing Method:
- Local Processing: Download Whisper models for completely private transcription
- Cloud Processing: Use OpenAI's API for faster transcription (requires API key)
Grant Permissions:
- Microphone Access: Required for voice recording
- Accessibility Permissions: Required for automatic text pasting (macOS)
Name Your Agent: Give your AI assistant a personal name (e.g., "Assistant", "Jarvis", "Alex")
- Makes interactions feel more natural and conversational
- Helps distinguish between giving commands and regular dictation
- Can be changed anytime in settings
Configure Global Hotkey: Default is backtick (`) but can be customized

Usage

Basic Dictation

Start the app - A small draggable panel appears on your screen
Press your hotkey (default: backtick `) - Start dictating (panel shows recording animation)
Press your hotkey again - Stop dictation and begin transcription (panel shows processing animation)
Text appears - Transcribed text is automatically pasted at your cursor location
Drag the panel - Click and drag to move the dictation panel anywhere on your screen

Control Panel

Access: Right-click the tray icon (macOS) or through the system menu
Configure: Choose between local and cloud processing
History: View, copy, and delete past transcriptions
Models: Download and manage local Whisper models
Storage Cleanup: Remove downloaded Whisper models from cache to reclaim space
Settings: Configure API keys, customize hotkeys, and manage permissions

Uninstall & Cache Cleanup

In-App: Use Settings → General → Local Model Storage → Remove Downloaded Models to clear ~/.cache/openwhispr/whisper-models (or %USERPROFILE%\.cache\openwhispr\whisper-models on Windows).
Windows Uninstall: The NSIS uninstaller automatically deletes the same cache directory.
Linux Packages: deb/rpm post-uninstall scripts also remove cached models.
macOS: If you uninstall manually, remove ~/Library/Caches or ~/.cache/openwhispr/whisper-models if desired.

Agent Naming & AI Processing

Once you've named your agent during setup, you can interact with it using multiple AI providers:

🎯 Agent Commands (for AI assistance):

"Hey [AgentName], make this more professional"
"Hey [AgentName], format this as a list"
"Hey [AgentName], write a thank you email"
"Hey [AgentName], convert this to bullet points"

🤖 AI Provider Options:

OpenAI: GPT-5, GPT-4.1, o-series reasoning models
Anthropic: Claude Opus 4.5, Sonnet 4.5, Haiku 4.5
Google: Gemini 2.5 Pro/Flash/Flash-Lite
Groq: Ultra-fast Llama and Mixtral inference
Local: Qwen, LLaMA, Mistral via llama.cpp

📝 Regular Dictation (for normal text):

"This is just normal text I want transcribed"
"Meeting notes: John mentioned the quarterly report"
"Dear Sarah, thank you for your help"

The AI automatically detects when you're giving it commands versus dictating regular text, and removes agent name references from the final output.

Processing Options

Local Processing:
- Install Whisper automatically through the Control Panel
- Download models: tiny (fastest), base (recommended), small, medium, large (best quality)
- Complete privacy - audio never leaves your device
Cloud Processing:
- Requires OpenAI API key
- Faster processing
- Uses OpenAI's Whisper API

Project Structure

open-whispr/
├── main.js              # Electron main process & IPC handlers
├── preload.js           # Electron preload script & API bridge
├── setup.js             # First-time setup script
├── package.json         # Dependencies and scripts
├── env.example          # Environment variables template
├── CHANGELOG.md         # Project changelog
├── src/
│   ├── App.jsx          # Main dictation interface
│   ├── main.jsx         # React entry point
│   ├── index.html       # Vite HTML template
│   ├── index.css        # Tailwind CSS v4 configuration
│   ├── vite.config.js   # Vite configuration
│   ├── components/
│   │   ├── ControlPanel.tsx     # Settings and history UI
│   │   ├── OnboardingFlow.tsx   # First-time setup wizard
│   │   ├── SettingsPage.tsx     # Settings interface
│   │   ├── ui/                  # shadcn/ui components
│   │   │   ├── button.tsx
│   │   │   ├── card.tsx
│   │   │   ├── input.tsx
│   │   │   ├── LoadingDots.tsx
│   │   │   ├── Toast.tsx
│   │   │   ├── toggle.tsx
│   │   │   └── tooltip.tsx
│   │   └── lib/
│   │       └── utils.ts         # Utility functions
│   ├── services/
│   │   └── ReasoningService.ts  # Multi-provider AI processing (OpenAI/Anthropic/Gemini)
│   ├── utils/
│   │   └── agentName.ts         # Agent name management utility
│   └── components.json          # shadcn/ui configuration
└── assets/                      # App icons and resources

Technology Stack

Frontend: React 19, TypeScript, Tailwind CSS v4
Build Tool: Vite with optimized Tailwind plugin
Desktop: Electron 36 with context isolation
UI Components: shadcn/ui with Radix primitives
Database: better-sqlite3 for local transcription storage
Speech-to-Text: OpenAI Whisper (powered by whisper.cpp for local, OpenAI API for cloud)
Icons: Lucide React for consistent iconography

Development

Scripts

npm run dev - Start development with hot reload
npm run start - Start production build
npm run setup - First-time setup (creates .env file)
npm run build:renderer - Build the React app only
npm run download:whisper-cpp - Download whisper.cpp for the current platform
npm run download:whisper-cpp:all - Download whisper.cpp for all platforms
npm run build - Full build with signing (requires certificates)
npm run build:mac - macOS build with signing
npm run build:win - Windows build with signing
npm run build:linux - Linux build
npm run pack - Build without signing (for personal use)
npm run dist - Build and package with signing
npm run lint - Run ESLint
npm run preview - Preview production build

Architecture

The app consists of two main windows:

Main Window: Minimal overlay for dictation controls
Control Panel: Full settings and history interface

Both use the same React codebase but render different components based on URL parameters.

Key Components

main.js: Electron main process, IPC handlers, database operations
preload.js: Secure bridge between main and renderer processes
App.jsx: Main dictation interface with recording controls
ControlPanel.tsx: Settings, history, and model management
src/helpers/whisper.js: whisper.cpp integration for local processing
better-sqlite3: Local database for transcription history

Tailwind CSS v4 Setup

This project uses the latest Tailwind CSS v4 with:

CSS-first configuration using @theme directive
Vite plugin for optimal performance
Custom design tokens for consistent theming
Dark mode support with @variant

Building

The build process creates a single executable for your platform:

# Development build
npm run pack

# Production builds
npm run dist           # Current platform
npm run build:mac      # macOS DMG + ZIP
npm run build:win      # Windows NSIS + Portable
npm run build:linux    # AppImage + DEB

Note: build/pack/dist scripts download whisper.cpp for the current platform automatically. For multi-platform packaging from one host, run npm run download:whisper-cpp:all first.

Configuration

Environment Variables

Create a .env file in the root directory (or use npm run setup):

# OpenAI API Configuration (optional - only needed for cloud processing)
OPENAI_API_KEY=your_openai_api_key_here

# Optional: Customize the Whisper model
WHISPER_MODEL=whisper-1

# Optional: Set language for better transcription accuracy
LANGUAGE=

# Optional: Anthropic API Configuration
ANTHROPIC_API_KEY=your_anthropic_api_key_here

# Optional: Google Gemini API Configuration  
GEMINI_API_KEY=your_gemini_api_key_here

# Optional: Debug mode
DEBUG=false

Local Whisper Setup

For local processing, OpenWhispr uses OpenAI's Whisper model via whisper.cpp - a high-performance C++ implementation:

Bundled Binary: whisper.cpp is bundled with the app for all platforms
GGML Models: Downloads optimized GGML models on first use to ~/.cache/openwhispr/whisper-models/
No Dependencies: No Python or other runtime required

System Fallback: If the bundled binary fails, install via package manager:

macOS: brew install whisper-cpp
Linux: Build from source at https://github.com/ggml-org/whisper.cpp

From Source: When running locally (not a packaged build), download the binary with npm run download:whisper-cpp so resources/bin/ has your platform executable.

Requirements:

Sufficient disk space for models (75MB - 3GB depending on model)

Upgrading from Python-based version: If you previously used the Python-based Whisper, you'll need to re-download models in GGML format. You can safely delete the old Python environment (~/.openwhispr/python/) and PyTorch models (~/.cache/whisper/) to reclaim disk space.

Customization

Hotkey: Change in the Control Panel (default: backtick `) - fully customizable
Panel Position: Drag the dictation panel to any location on your screen`
Processing Method: Choose local or cloud in Control Panel
Whisper Model: Select quality vs speed in Control Panel
UI Theme: Edit CSS variables in src/index.css
Window Size: Adjust dimensions in main.js
Database: Transcriptions stored in user data directory

Contributing

We welcome contributions! Please follow these steps:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Guidelines

Run npm run lint before committing
Follow the existing code style
Update documentation as needed
Test on your target platform before submitting

Security

OpenWhispr is designed with privacy and security in mind:

Local Processing Option: Keep your voice data completely private
No Analytics: We don't collect any usage data or telemetry
Open Source: All code is available for review
Secure Storage: API keys are stored securely in your system's keychain/credential manager
Minimal Permissions: Only requests necessary permissions (microphone, accessibility)

Troubleshooting

Common Issues

Microphone permissions: Grant permissions in System Preferences/Settings
Accessibility permissions (macOS): Required for automatic text pasting
- Go to System Settings → Privacy & Security → Accessibility
- Add OpenWhispr and enable the checkbox
- Use "Fix Permission Issues" in Control Panel if needed
API key errors (cloud processing only): Ensure your OpenAI API key is valid and has credits
- Set key through Control Panel or .env file
- Check logs for "OpenAI API Key present: Yes/No"
Local Whisper issues:
- whisper.cpp is bundled with the app
- If bundled binary fails, install via brew install whisper-cpp (macOS)
- Check available disk space for models
Global hotkey conflicts: Change the hotkey in the Control Panel - any key can be used
Text not pasting:
- macOS: Check accessibility permissions (System Settings → Privacy & Security → Accessibility)
- Linux X11: Install xdotool
- Linux Wayland: Install wtype or ydotool (ensure ydotoold daemon is running)
- All platforms: Text is always copied to clipboard - use Ctrl+V (Cmd+V on macOS) to paste manually
Panel position: If the panel appears off-screen, restart the app to reset position

Getting Help

Check the Issues page
Review the console logs for debugging information
For local processing: Ensure whisper.cpp is accessible and models are downloaded
For cloud processing: Verify your OpenAI API key and billing status
Check the Control Panel for system status and diagnostics

Performance Tips

Local Processing: Use "base" model for best balance of speed and accuracy
Cloud Processing: Generally faster but requires internet connection
Model Selection: tiny (fastest) → base (recommended) → small → medium → large (best quality)
Permissions: Ensure all required permissions are granted for smooth operation

FAQ

Q: Is OpenWhispr really free? A: Yes! OpenWhispr is open source and free to use. You only pay for OpenAI API usage if you choose cloud processing.

Q: Which processing method should I use? A: Use local processing for privacy and offline use. Use cloud processing for speed and convenience.

Q: Can I use this commercially? A: Yes! The MIT license allows commercial use.

Q: How do I change the hotkey? A: Open the Control Panel (right-click tray icon) and go to Settings. You can set any key as your hotkey.

Q: Is my data secure? A: With local processing, your audio never leaves your device. With cloud processing, audio is sent to OpenAI's servers (see their privacy policy).

Q: What languages are supported? A: OpenWhispr supports 58 languages including English, Spanish, French, German, Chinese, Japanese, and more. Set your preferred language in the .env file or use auto-detect.

Project Status

OpenWhispr is actively maintained and ready for production use. Current version: 1.2.12

✅ Core functionality complete
✅ Cross-platform support (macOS, Windows, Linux)
✅ Local and cloud processing
✅ Multi-provider AI (OpenAI, Anthropic, Gemini, Groq, Local)
✅ Compound hotkey support

Acknowledgments

OpenAI Whisper - The speech recognition model that powers both local and cloud transcription
whisper.cpp - High-performance C++ implementation of Whisper for local processing
Electron - Cross-platform desktop application framework
React - UI component library
shadcn/ui - Beautiful UI components built on Radix primitives
llama.cpp - Local LLM inference for AI-powered text processing

Name		Name	Last commit message	Last commit date
Latest commit History 400 Commits
.claude		.claude
.github		.github
resources		resources
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
DEBUG.md		DEBUG.md
LICENSE		LICENSE
LOCAL_WHISPER_SETUP.md		LOCAL_WHISPER_SETUP.md
README.md		README.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
WINDOWS_TROUBLESHOOTING.md		WINDOWS_TROUBLESHOOTING.md
cleanup.js		cleanup.js
electron-builder.json		electron-builder.json
eslint.config.js		eslint.config.js
main.js		main.js
package-lock.json		package-lock.json
package.json		package.json
preload.js		preload.js
setup.js		setup.js

License

HeroTools/open-whispr

Folders and files

Latest commit

History

Repository files navigation

OpenWhispr

Star History

License

Features

Prerequisites

Quick Start

For Personal Use (Recommended)

Building for Personal Use (Optional)

Linux (Multiple Package Formats)

Building for Distribution

First Time Setup

Usage

Basic Dictation

Control Panel

Uninstall & Cache Cleanup

Agent Naming & AI Processing

Processing Options

Project Structure

Technology Stack

Development

Scripts

Architecture

Key Components

Tailwind CSS v4 Setup

Building

Configuration

Environment Variables

Local Whisper Setup

Customization

Contributing

Development Guidelines

Security

Troubleshooting

Common Issues

Getting Help

Performance Tips

FAQ

Project Status

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 38

Packages 0

Contributors 9

Languages

Packages