WhisperTranscribe

Audio transcription tool powered by Faster Whisper. Record from microphone, upload audio files, or paste YouTube URLs. Optional AI-powered summarization via OpenAI GPT-4o.

Features

Transcribe audio with Faster Whisper (8 model sizes available)
Supports multiple audio formats (WAV, MP3, FLAC, OGG, M4A, AAC, WMA, Opus, WebM)
Record from microphone or system audio via WASAPI loopback (CLI) or browser (Web UI)
Download and transcribe YouTube videos via yt-dlp
Split long audio into chunks for reliable processing
Multiple summary modes via selectable prompt templates (GPT-4o)
OpenAI API key input from UI (no .env required)
Web interface (Gradio) and CLI modes
Docker support (CPU and GPU)

Quick Start

Docker (recommended)

git clone https://github.com/Migue8gl/WhisperTranscribe.git
cd WhisperTranscribe

# (Optional) Set OpenAI key for summarization
echo "OPENAI_API_KEY=your_key_here" > .env

# Build and start
docker compose up --build

# Open http://localhost:7860

Local Installation

Prerequisites: Python 3.10+, FFmpeg

pip install -r requirements.txt

# For NVIDIA GPU acceleration
pip install -r requirements-gpu.txt

# Launch web UI
python src/main.py --ui

# Or use CLI directly
python src/main.py -l "https://youtu.be/VIDEO_ID" -m l -s prompt_schema_md

Web Interface

Launch with python src/main.py --ui or via Docker. Opens at http://localhost:7860.

Audio source selector - switch between Upload/Microphone and YouTube URL
Transcription settings (collapsible) - Whisper model, language selection, chunk duration
AI Summary (collapsible) - summary mode, OpenAI model selection, API key input
Download transcription (.txt) and summary (.md) directly from the UI
Copy results with one click

CLI Usage

# List available recording devices
python src/main.py --list-devices

# Record from microphone with medium model
python src/main.py -d 2 -m m

# Transcribe a YouTube video with large model
python src/main.py -l "https://youtu.be/VIDEO_ID" -m l

# Transcribe local file with markdown summary
python src/main.py -l recording.wav -s prompt_schema_md

# Transcribe with plain text summary
python src/main.py -l recording.wav -s prompt_schema

# Custom chunk duration and output name
python src/main.py -l lecture.wav -c 60 -n lecture_transcript.txt

# Record system audio (what you hear in headphones, Windows only)
python src/main.py --loopback -m m

# Loopback from a specific output device
python src/main.py --loopback -d 6 -m m

# Force Spanish language (skip auto-detection)
python src/main.py -l audio.wav -m m --language es

# Verbose debug output
python src/main.py -l audio.wav -m t -v

CLI Flags

Flag	Long Form	Description	Default
`-m`	`--model`	Model size: t/s/b/m/l/lt/d2/d3	`m`
`-d`	`--device`	Device ID for recording	Auto
	`--loopback`	Record system audio (Windows WASAPI)	Off
`-c`	`--chunk_duration`	Chunk size in seconds	`30`
`-l`	`--load`	Audio file path or YouTube URL	None
`-s`	`--summarize`	Summarize with a prompt from `prompts/` (by name)	Off
`-n`	`--name`	Custom output file name	Auto
	`--openai-model`	OpenAI model for summary (see table below)	`gpt-5-mini`
	`--language`	Language code for transcription (e.g. en, es, fr)	Auto-detect
`-v`	`--verbose`	Enable debug logging	Off
	`--ui`	Launch Gradio web interface
	`--list-devices`	Show audio input devices and exit
	`--version`	Show version and exit

Docker

CPU (default)

docker compose up --build

GPU (NVIDIA)

Requires NVIDIA Container Toolkit.

docker compose -f docker-compose.yml -f docker-compose.gpu.yml up --build

Volumes

Host Path	Container Path	Purpose
`./models`	`/app/models`	Cached Whisper models (persist)
`./output`	`/app/output`	Saved transcriptions

Models are downloaded on first use and cached. The first run will take extra time depending on the model size.

Project Structure

WhisperTranscribe/
├── src/
│   ├── main.py                # Core logic and CLI entry point
│   └── app.py                 # Gradio web interface
├── prompts/
│   ├── prompt_schema.txt      # Plain text summary prompt
│   └── prompt_schema_md.txt   # Markdown summary prompt
├── Dockerfile
├── docker-compose.yml
├── docker-compose.gpu.yml
├── requirements.txt           # CPU dependencies
├── requirements-gpu.txt       # GPU dependencies (CUDA)
├── .env.example               # Environment variable template
└── README.md

Runtime directories (gitignored):

audio/ - Downloaded/recorded audio and chunks
output/ - Transcriptions and summaries
models/ - Cached Whisper models

Configuration

Environment Variables

Variable	Required	Description
`OPENAI_API_KEY`	No	OpenAI API key for summarization

Only required when using -s (CLI) or selecting a summary mode (UI). The program works without it for transcription-only workflows. In the web UI, you can paste the key directly without needing a .env file.

cp .env.example .env
# Edit .env with your API key

Whisper Models

Code	Model	Size	Speed	Accuracy
`t`	tiny	39 MB	Fastest	Low
`b`	base	74 MB	Fast	Fair
`s`	small	244 MB	Medium	Good
`m`	medium	769 MB	Slow	High
`l`	large-v3	1.5 GB	Slowest	Best
`lt`	large-v3-turbo	809 MB	Fast	High
`d2`	distil-large-v2	756 MB	Fast	High
`d3`	distil-large-v3	756 MB	Fast	High

OpenAI Models

Model	Speed	Cost	Best for
`gpt-5.2`	Medium	Higher	Most advanced, complex analysis
`gpt-5.2-pro`	Slower	Highest	Deep reasoning tasks
`gpt-5.2-codex`	Medium	Higher	Code-focused tasks
`gpt-5.1`	Medium	Higher	High-quality text generation
`gpt-5-mini`	Fast	Low	General use (default)
`gpt-5-nano`	Fastest	Lowest	Quick summaries, classification
`gpt-4.1`	Medium	Medium	Versatile text tasks
`gpt-4.1-mini`	Fast	Low	Good balance
`gpt-4.1-nano`	Fastest	Lowest	Lightweight tasks
`gpt-4o`	Medium	Medium	Multimodal capable
`gpt-4o-mini`	Fast	Low	Legacy general use
`o4-mini`	Medium	Low	Reasoning tasks

Summary Modes (Prompt Templates)

Summary modes are loaded automatically from .txt files in the prompts/ directory. Each file becomes a selectable option in both the CLI (-s) and the web UI dropdown.

Built-in prompts:

Action Items (action_items.txt) - Extract tasks, deadlines and responsibilities
Interview (interview.txt) - Q&A structure, key quotes, recurring themes
Lecture Notes (lecture_notes.txt) - Academic notes with definitions, formulas and examples
Meeting Notes (meeting_notes.txt) - Minutes with decisions, action items and pending topics
Podcast Summary (podcast_summary.txt) - Accessible summary of multimedia content
Prompt Schema (prompt_schema.txt) - General structured analysis
Prompt Schema Md (prompt_schema_md.txt) - Detailed Markdown-formatted technical notes

To add a custom mode, create a new .txt file in prompts/. Use [transcription here] as placeholder:

title=Meeting Notes
Analyze this meeting transcript:
1. Key decisions made
2. Action items
3. Follow-up topics

[transcription here]

CLI usage: python src/main.py -l audio.wav -s meeting_notes

Requirements

Python 3.10+
FFmpeg (for YouTube audio extraction)
NVIDIA GPU + CUDA (optional, for faster transcription)
4 GB RAM minimum (more for larger models)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WhisperTranscribe

Features

Quick Start

Docker (recommended)

Local Installation

Web Interface

CLI Usage

CLI Flags

Docker

CPU (default)

GPU (NVIDIA)

Volumes

Project Structure

Configuration

Environment Variables

Whisper Models

OpenAI Models

Summary Modes (Prompt Templates)

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
prompts		prompts
src		src
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.mypy.ini		.mypy.ini
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
docker-compose.gpu.yml		docker-compose.gpu.yml
docker-compose.yml		docker-compose.yml
requirements-gpu.txt		requirements-gpu.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

WhisperTranscribe

Features

Quick Start

Docker (recommended)

Local Installation

Web Interface

CLI Usage

CLI Flags

Docker

CPU (default)

GPU (NVIDIA)

Volumes

Project Structure

Configuration

Environment Variables

Whisper Models

OpenAI Models

Summary Modes (Prompt Templates)

Requirements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages