Audio transcription tool powered by Faster Whisper. Record from microphone, upload audio files, or paste YouTube URLs. Optional AI-powered summarization via OpenAI GPT-4o.
- Transcribe audio with Faster Whisper (8 model sizes available)
- Supports multiple audio formats (WAV, MP3, FLAC, OGG, M4A, AAC, WMA, Opus, WebM)
- Record from microphone or system audio via WASAPI loopback (CLI) or browser (Web UI)
- Download and transcribe YouTube videos via yt-dlp
- Split long audio into chunks for reliable processing
- Multiple summary modes via selectable prompt templates (GPT-4o)
- OpenAI API key input from UI (no
.envrequired) - Web interface (Gradio) and CLI modes
- Docker support (CPU and GPU)
git clone https://github.com/Migue8gl/WhisperTranscribe.git
cd WhisperTranscribe
# (Optional) Set OpenAI key for summarization
echo "OPENAI_API_KEY=your_key_here" > .env
# Build and start
docker compose up --build
# Open http://localhost:7860Prerequisites: Python 3.10+, FFmpeg
pip install -r requirements.txt
# For NVIDIA GPU acceleration
pip install -r requirements-gpu.txt
# Launch web UI
python src/main.py --ui
# Or use CLI directly
python src/main.py -l "https://youtu.be/VIDEO_ID" -m l -s prompt_schema_mdLaunch with python src/main.py --ui or via Docker. Opens at http://localhost:7860.
- Audio source selector - switch between Upload/Microphone and YouTube URL
- Transcription settings (collapsible) - Whisper model, language selection, chunk duration
- AI Summary (collapsible) - summary mode, OpenAI model selection, API key input
- Download transcription (.txt) and summary (.md) directly from the UI
- Copy results with one click
# List available recording devices
python src/main.py --list-devices
# Record from microphone with medium model
python src/main.py -d 2 -m m
# Transcribe a YouTube video with large model
python src/main.py -l "https://youtu.be/VIDEO_ID" -m l
# Transcribe local file with markdown summary
python src/main.py -l recording.wav -s prompt_schema_md
# Transcribe with plain text summary
python src/main.py -l recording.wav -s prompt_schema
# Custom chunk duration and output name
python src/main.py -l lecture.wav -c 60 -n lecture_transcript.txt
# Record system audio (what you hear in headphones, Windows only)
python src/main.py --loopback -m m
# Loopback from a specific output device
python src/main.py --loopback -d 6 -m m
# Force Spanish language (skip auto-detection)
python src/main.py -l audio.wav -m m --language es
# Verbose debug output
python src/main.py -l audio.wav -m t -v| Flag | Long Form | Description | Default |
|---|---|---|---|
-m |
--model |
Model size: t/s/b/m/l/lt/d2/d3 | m |
-d |
--device |
Device ID for recording | Auto |
--loopback |
Record system audio (Windows WASAPI) | Off | |
-c |
--chunk_duration |
Chunk size in seconds | 30 |
-l |
--load |
Audio file path or YouTube URL | None |
-s |
--summarize |
Summarize with a prompt from prompts/ (by name) |
Off |
-n |
--name |
Custom output file name | Auto |
--openai-model |
OpenAI model for summary (see table below) | gpt-5-mini |
|
--language |
Language code for transcription (e.g. en, es, fr) | Auto-detect | |
-v |
--verbose |
Enable debug logging | Off |
--ui |
Launch Gradio web interface | ||
--list-devices |
Show audio input devices and exit | ||
--version |
Show version and exit |
docker compose up --buildRequires NVIDIA Container Toolkit.
docker compose -f docker-compose.yml -f docker-compose.gpu.yml up --build| Host Path | Container Path | Purpose |
|---|---|---|
./models |
/app/models |
Cached Whisper models (persist) |
./output |
/app/output |
Saved transcriptions |
Models are downloaded on first use and cached. The first run will take extra time depending on the model size.
WhisperTranscribe/
├── src/
│ ├── main.py # Core logic and CLI entry point
│ └── app.py # Gradio web interface
├── prompts/
│ ├── prompt_schema.txt # Plain text summary prompt
│ └── prompt_schema_md.txt # Markdown summary prompt
├── Dockerfile
├── docker-compose.yml
├── docker-compose.gpu.yml
├── requirements.txt # CPU dependencies
├── requirements-gpu.txt # GPU dependencies (CUDA)
├── .env.example # Environment variable template
└── README.md
Runtime directories (gitignored):
audio/- Downloaded/recorded audio and chunksoutput/- Transcriptions and summariesmodels/- Cached Whisper models
| Variable | Required | Description |
|---|---|---|
OPENAI_API_KEY |
No | OpenAI API key for summarization |
Only required when using -s (CLI) or selecting a summary mode (UI). The program works without it for transcription-only workflows. In the web UI, you can paste the key directly without needing a .env file.
cp .env.example .env
# Edit .env with your API key| Code | Model | Size | Speed | Accuracy |
|---|---|---|---|---|
t |
tiny | 39 MB | Fastest | Low |
b |
base | 74 MB | Fast | Fair |
s |
small | 244 MB | Medium | Good |
m |
medium | 769 MB | Slow | High |
l |
large-v3 | 1.5 GB | Slowest | Best |
lt |
large-v3-turbo | 809 MB | Fast | High |
d2 |
distil-large-v2 | 756 MB | Fast | High |
d3 |
distil-large-v3 | 756 MB | Fast | High |
| Model | Speed | Cost | Best for |
|---|---|---|---|
gpt-5.2 |
Medium | Higher | Most advanced, complex analysis |
gpt-5.2-pro |
Slower | Highest | Deep reasoning tasks |
gpt-5.2-codex |
Medium | Higher | Code-focused tasks |
gpt-5.1 |
Medium | Higher | High-quality text generation |
gpt-5-mini |
Fast | Low | General use (default) |
gpt-5-nano |
Fastest | Lowest | Quick summaries, classification |
gpt-4.1 |
Medium | Medium | Versatile text tasks |
gpt-4.1-mini |
Fast | Low | Good balance |
gpt-4.1-nano |
Fastest | Lowest | Lightweight tasks |
gpt-4o |
Medium | Medium | Multimodal capable |
gpt-4o-mini |
Fast | Low | Legacy general use |
o4-mini |
Medium | Low | Reasoning tasks |
Summary modes are loaded automatically from .txt files in the prompts/ directory. Each file becomes a selectable option in both the CLI (-s) and the web UI dropdown.
Built-in prompts:
- Action Items (
action_items.txt) - Extract tasks, deadlines and responsibilities - Interview (
interview.txt) - Q&A structure, key quotes, recurring themes - Lecture Notes (
lecture_notes.txt) - Academic notes with definitions, formulas and examples - Meeting Notes (
meeting_notes.txt) - Minutes with decisions, action items and pending topics - Podcast Summary (
podcast_summary.txt) - Accessible summary of multimedia content - Prompt Schema (
prompt_schema.txt) - General structured analysis - Prompt Schema Md (
prompt_schema_md.txt) - Detailed Markdown-formatted technical notes
To add a custom mode, create a new .txt file in prompts/. Use [transcription here] as placeholder:
title=Meeting Notes
Analyze this meeting transcript:
1. Key decisions made
2. Action items
3. Follow-up topics
[transcription here]
CLI usage: python src/main.py -l audio.wav -s meeting_notes
- Python 3.10+
- FFmpeg (for YouTube audio extraction)
- NVIDIA GPU + CUDA (optional, for faster transcription)
- 4 GB RAM minimum (more for larger models)