Whisper Studio

Native Windows desktop application for real-time speech transcription and dictation using OpenAI Whisper.

About

Everyone and their dog has built a Whisper wrapper these days. There are countless Python GUIs, Electron apps, web interfaces, etc. Still somehow, none of them quite fit the workflow I needed, so I built my own. As such Whisper Studio is a native lightweight C++ application that runs locally on your Windows machine, transcribes audio in real-time, identifies speakers, can auto-paste transcriptions, and a few other things. Its not the prettiest app, I suck at design, but it gets the job done.

Features

Native C++ - Implemented in C++. Thanks to the hard work on whisper.cpp by ggerganov, which made this project possible.
Real-Time Dictation - Live transcription mode which works via automatic silence detection and segmentation.
Speaker Diarization - Identifies and labels different speakers using neural network–based models powered by the hard work of the Sherpa-ONNX team.
Auto-Paste - Self-explanatory
Global Hotkeys - default: F6
Built in Editor - All transcriptions saved locally with inline editing support
Export Transcriptions - Export to TXT, JSON, or SRT (subtitles)
Multiple Input Sources - This should be obvious, but I noticed many low quality python wrappers didn't support it.
Model Management - Once again should be obvious but including here for the same reason as stated above.
System Tray - Run in background with tray icon for quick access to common actions
Audio Format Support - Handles WAV natively; converts other formats via FFmpeg

Installation

Option 1: Use the Installer (Recommended)

This is bundled with all dependencies and is the easiest way to get started. No need to manually install CUDA, FFmpeg, etc.

Download the latest installer from the Releases page
Choose between:
- Standard (CPU) - Works on any Windows 10/11 machine
- NVIDIA GPU - Accelerated transcription if you have a CUDA-capable GPU
Run the installer and follow the prompts
Launch Whisper Studio from your Start Menu

Option 2: Build from Source

Requirements:

Windows 10/11
Visual Studio 2022 with C++ Desktop Development workload
CMake 3.20+
(Optional) CUDA Toolkit 11.8+ for GPU acceleration
(Optional) FFmpeg in PATH for non-WAV audio file support

Build Steps:

# Clone the repository
git clone https://github.com/yourusername/whisper-studio.git
cd whisper-studio

# Configure with CMake
# The build system will automatically detect if CUDA is installed
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release

# Build
cmake --build build --config Release

# Run
.\build\Release\WhisperStudio.exe

For GPU support, ensure CUDA Toolkit is installed before running CMake.

Usage

Basic Transcription

Select your microphone from the Audio Device dropdown
Download a model - Click the Whisper Model dropdown and select a model to download
Click Record or press your hotkey (F6) to start recording
Speak naturally - The waveform will show your audio
Stop recording - Click Stop or press the hotkey again
Wait for transcription - Results appear in the Transcription panel

Live Dictation Mode

Enable Live Transcription in Settings
Start recording
Speak in natural phrases with brief pauses
The app automatically detects silence, transcribes each segment, and continues recording
Enable Auto-Paste to have text automatically typed into your active window

Speaker Identification

Enable Speaker Diarization in Settings
Download the required models when prompted (Pyannote segmentation + 3D-Speaker embedding)
Transcriptions will be labeled with "Speaker 1:", "Speaker 2:", etc.

File Transcription

Click Open File to import an existing audio file
Supported formats: WAV (native), MP3, M4A, FLAC (requires FFmpeg)
Select the file and click Transcribe

Models

Whisper Studio supports all standard Whisper.cpp models:

Model	Size	Speed	Accuracy	Best For
tiny.en	75 MB	Fastest	Basic	Quick notes, testing
base.en	142 MB	Very Fast	Good	General dictation
small.en	466 MB	Fast	Better	Quality transcription
medium.en	1.5 GB	Moderate	Great	Professional work
large-v3	3.1 GB	Slow	Best	Maximum accuracy

Models are downloaded on-demand through the built-in model manager. Quantized variants (q5_0, q8_0) are also available for reduced memory usage.

Technical Architecture

Whisper Studio is built entirely in modern C++17 with:

GUI: Dear ImGui (docking branch) + SDL2
ASR Engine: whisper.cpp (local inference, CPU/CUDA)
Diarization: sherpa-onnx (neural speaker identification)
Audio: SDL2 (16kHz mono PCM capture)
Networking: WinHTTP (native Windows, for model downloads)
Build: CMake with FetchContent dependency management

Performance Recommendations

Use GPU acceleration if you have an NVIDIA card (5-10x faster than CPU)
Enable quantization (q5_0, q8_0 variants) to reduce memory usage with minimal accuracy loss

Contributing

Contributions are welcome! This project is a personal workflow tool that grew larger than expected, and there's plenty of room for improvement.

License

This project is licensed under the MIT License.

Acknowledgments

OpenAI for the Whisper models
whisper.cpp by Georgi Gerganov
sherpa-onnx for speaker diarization
Dear ImGui for the GUI framework
The entire open-source speech recognition community

Note: This is an independent project and is not affiliated with OpenAI.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Installer		Installer
resources		resources
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper Studio

Quick Links

About

Features

Installation

Option 1: Use the Installer (Recommended)

Option 2: Build from Source

Usage

Basic Transcription

Live Dictation Mode

Speaker Identification

File Transcription

Models

Technical Architecture

Performance Recommendations

Contributing

License

Acknowledgments

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

JasonVinion/Whisper-Studio

Folders and files

Latest commit

History

Repository files navigation

Whisper Studio

Quick Links

About

Features

Installation

Option 1: Use the Installer (Recommended)

Option 2: Build from Source

Usage

Basic Transcription

Live Dictation Mode

Speaker Identification

File Transcription

Models

Technical Architecture

Performance Recommendations

Contributing

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages