Native Windows desktop application for real-time speech transcription and dictation using OpenAI Whisper.
Everyone and their dog has built a Whisper wrapper these days. There are countless Python GUIs, Electron apps, web interfaces, etc. Still somehow, none of them quite fit the workflow I needed, so I built my own. As such Whisper Studio is a native lightweight C++ application that runs locally on your Windows machine, transcribes audio in real-time, identifies speakers, can auto-paste transcriptions, and a few other things. Its not the prettiest app, I suck at design, but it gets the job done.
- Native C++ - Implemented in C++. Thanks to the hard work on whisper.cpp by ggerganov, which made this project possible.
- Real-Time Dictation - Live transcription mode which works via automatic silence detection and segmentation.
- Speaker Diarization - Identifies and labels different speakers using neural network–based models powered by the hard work of the Sherpa-ONNX team.
- Auto-Paste - Self-explanatory
- Global Hotkeys - default: F6
- Built in Editor - All transcriptions saved locally with inline editing support
- Export Transcriptions - Export to TXT, JSON, or SRT (subtitles)
- Multiple Input Sources - This should be obvious, but I noticed many low quality python wrappers didn't support it.
- Model Management - Once again should be obvious but including here for the same reason as stated above.
- System Tray - Run in background with tray icon for quick access to common actions
- Audio Format Support - Handles WAV natively; converts other formats via FFmpeg
This is bundled with all dependencies and is the easiest way to get started. No need to manually install CUDA, FFmpeg, etc.
- Download the latest installer from the Releases page
- Choose between:
- Standard (CPU) - Works on any Windows 10/11 machine
- NVIDIA GPU - Accelerated transcription if you have a CUDA-capable GPU
- Run the installer and follow the prompts
- Launch Whisper Studio from your Start Menu
Requirements:
- Windows 10/11
- Visual Studio 2022 with C++ Desktop Development workload
- CMake 3.20+
- (Optional) CUDA Toolkit 11.8+ for GPU acceleration
- (Optional) FFmpeg in PATH for non-WAV audio file support
Build Steps:
# Clone the repository
git clone https://github.com/yourusername/whisper-studio.git
cd whisper-studio
# Configure with CMake
# The build system will automatically detect if CUDA is installed
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
# Build
cmake --build build --config Release
# Run
.\build\Release\WhisperStudio.exeFor GPU support, ensure CUDA Toolkit is installed before running CMake.
- Select your microphone from the Audio Device dropdown
- Download a model - Click the Whisper Model dropdown and select a model to download
- Click Record or press your hotkey (F6) to start recording
- Speak naturally - The waveform will show your audio
- Stop recording - Click Stop or press the hotkey again
- Wait for transcription - Results appear in the Transcription panel
- Enable Live Transcription in Settings
- Start recording
- Speak in natural phrases with brief pauses
- The app automatically detects silence, transcribes each segment, and continues recording
- Enable Auto-Paste to have text automatically typed into your active window
- Enable Speaker Diarization in Settings
- Download the required models when prompted (Pyannote segmentation + 3D-Speaker embedding)
- Transcriptions will be labeled with "Speaker 1:", "Speaker 2:", etc.
- Click Open File to import an existing audio file
- Supported formats: WAV (native), MP3, M4A, FLAC (requires FFmpeg)
- Select the file and click Transcribe
Whisper Studio supports all standard Whisper.cpp models:
| Model | Size | Speed | Accuracy | Best For |
|---|---|---|---|---|
| tiny.en | 75 MB | Fastest | Basic | Quick notes, testing |
| base.en | 142 MB | Very Fast | Good | General dictation |
| small.en | 466 MB | Fast | Better | Quality transcription |
| medium.en | 1.5 GB | Moderate | Great | Professional work |
| large-v3 | 3.1 GB | Slow | Best | Maximum accuracy |
Models are downloaded on-demand through the built-in model manager. Quantized variants (q5_0, q8_0) are also available for reduced memory usage.
Whisper Studio is built entirely in modern C++17 with:
- GUI: Dear ImGui (docking branch) + SDL2
- ASR Engine: whisper.cpp (local inference, CPU/CUDA)
- Diarization: sherpa-onnx (neural speaker identification)
- Audio: SDL2 (16kHz mono PCM capture)
- Networking: WinHTTP (native Windows, for model downloads)
- Build: CMake with FetchContent dependency management
- Use GPU acceleration if you have an NVIDIA card (5-10x faster than CPU)
- Enable quantization (q5_0, q8_0 variants) to reduce memory usage with minimal accuracy loss
Contributions are welcome! This project is a personal workflow tool that grew larger than expected, and there's plenty of room for improvement.
This project is licensed under the MIT License.
- OpenAI for the Whisper models
- whisper.cpp by Georgi Gerganov
- sherpa-onnx for speaker diarization
- Dear ImGui for the GUI framework
- The entire open-source speech recognition community
Note: This is an independent project and is not affiliated with OpenAI.