Skip to content

Migue8gl/WhisperTranscribe

Repository files navigation

WhisperTranscribe

Audio transcription tool powered by Faster Whisper. Record from microphone, upload audio files, or paste YouTube URLs. Optional AI-powered summarization via OpenAI GPT-4o.

Features

  • Transcribe audio with Faster Whisper (8 model sizes available)
  • Supports multiple audio formats (WAV, MP3, FLAC, OGG, M4A, AAC, WMA, Opus, WebM)
  • Record from microphone or system audio via WASAPI loopback (CLI) or browser (Web UI)
  • Download and transcribe YouTube videos via yt-dlp
  • Split long audio into chunks for reliable processing
  • Multiple summary modes via selectable prompt templates (GPT-4o)
  • OpenAI API key input from UI (no .env required)
  • Web interface (Gradio) and CLI modes
  • Docker support (CPU and GPU)

Quick Start

Docker (recommended)

git clone https://github.com/Migue8gl/WhisperTranscribe.git
cd WhisperTranscribe

# (Optional) Set OpenAI key for summarization
echo "OPENAI_API_KEY=your_key_here" > .env

# Build and start
docker compose up --build

# Open http://localhost:7860

Local Installation

Prerequisites: Python 3.10+, FFmpeg

pip install -r requirements.txt

# For NVIDIA GPU acceleration
pip install -r requirements-gpu.txt

# Launch web UI
python src/main.py --ui

# Or use CLI directly
python src/main.py -l "https://youtu.be/VIDEO_ID" -m l -s prompt_schema_md

Web Interface

Launch with python src/main.py --ui or via Docker. Opens at http://localhost:7860.

  • Audio source selector - switch between Upload/Microphone and YouTube URL
  • Transcription settings (collapsible) - Whisper model, language selection, chunk duration
  • AI Summary (collapsible) - summary mode, OpenAI model selection, API key input
  • Download transcription (.txt) and summary (.md) directly from the UI
  • Copy results with one click

CLI Usage

# List available recording devices
python src/main.py --list-devices

# Record from microphone with medium model
python src/main.py -d 2 -m m

# Transcribe a YouTube video with large model
python src/main.py -l "https://youtu.be/VIDEO_ID" -m l

# Transcribe local file with markdown summary
python src/main.py -l recording.wav -s prompt_schema_md

# Transcribe with plain text summary
python src/main.py -l recording.wav -s prompt_schema

# Custom chunk duration and output name
python src/main.py -l lecture.wav -c 60 -n lecture_transcript.txt

# Record system audio (what you hear in headphones, Windows only)
python src/main.py --loopback -m m

# Loopback from a specific output device
python src/main.py --loopback -d 6 -m m

# Force Spanish language (skip auto-detection)
python src/main.py -l audio.wav -m m --language es

# Verbose debug output
python src/main.py -l audio.wav -m t -v

CLI Flags

Flag Long Form Description Default
-m --model Model size: t/s/b/m/l/lt/d2/d3 m
-d --device Device ID for recording Auto
--loopback Record system audio (Windows WASAPI) Off
-c --chunk_duration Chunk size in seconds 30
-l --load Audio file path or YouTube URL None
-s --summarize Summarize with a prompt from prompts/ (by name) Off
-n --name Custom output file name Auto
--openai-model OpenAI model for summary (see table below) gpt-5-mini
--language Language code for transcription (e.g. en, es, fr) Auto-detect
-v --verbose Enable debug logging Off
--ui Launch Gradio web interface
--list-devices Show audio input devices and exit
--version Show version and exit

Docker

CPU (default)

docker compose up --build

GPU (NVIDIA)

Requires NVIDIA Container Toolkit.

docker compose -f docker-compose.yml -f docker-compose.gpu.yml up --build

Volumes

Host Path Container Path Purpose
./models /app/models Cached Whisper models (persist)
./output /app/output Saved transcriptions

Models are downloaded on first use and cached. The first run will take extra time depending on the model size.

Project Structure

WhisperTranscribe/
├── src/
│   ├── main.py                # Core logic and CLI entry point
│   └── app.py                 # Gradio web interface
├── prompts/
│   ├── prompt_schema.txt      # Plain text summary prompt
│   └── prompt_schema_md.txt   # Markdown summary prompt
├── Dockerfile
├── docker-compose.yml
├── docker-compose.gpu.yml
├── requirements.txt           # CPU dependencies
├── requirements-gpu.txt       # GPU dependencies (CUDA)
├── .env.example               # Environment variable template
└── README.md

Runtime directories (gitignored):

  • audio/ - Downloaded/recorded audio and chunks
  • output/ - Transcriptions and summaries
  • models/ - Cached Whisper models

Configuration

Environment Variables

Variable Required Description
OPENAI_API_KEY No OpenAI API key for summarization

Only required when using -s (CLI) or selecting a summary mode (UI). The program works without it for transcription-only workflows. In the web UI, you can paste the key directly without needing a .env file.

cp .env.example .env
# Edit .env with your API key

Whisper Models

Code Model Size Speed Accuracy
t tiny 39 MB Fastest Low
b base 74 MB Fast Fair
s small 244 MB Medium Good
m medium 769 MB Slow High
l large-v3 1.5 GB Slowest Best
lt large-v3-turbo 809 MB Fast High
d2 distil-large-v2 756 MB Fast High
d3 distil-large-v3 756 MB Fast High

OpenAI Models

Model Speed Cost Best for
gpt-5.2 Medium Higher Most advanced, complex analysis
gpt-5.2-pro Slower Highest Deep reasoning tasks
gpt-5.2-codex Medium Higher Code-focused tasks
gpt-5.1 Medium Higher High-quality text generation
gpt-5-mini Fast Low General use (default)
gpt-5-nano Fastest Lowest Quick summaries, classification
gpt-4.1 Medium Medium Versatile text tasks
gpt-4.1-mini Fast Low Good balance
gpt-4.1-nano Fastest Lowest Lightweight tasks
gpt-4o Medium Medium Multimodal capable
gpt-4o-mini Fast Low Legacy general use
o4-mini Medium Low Reasoning tasks

Summary Modes (Prompt Templates)

Summary modes are loaded automatically from .txt files in the prompts/ directory. Each file becomes a selectable option in both the CLI (-s) and the web UI dropdown.

Built-in prompts:

  • Action Items (action_items.txt) - Extract tasks, deadlines and responsibilities
  • Interview (interview.txt) - Q&A structure, key quotes, recurring themes
  • Lecture Notes (lecture_notes.txt) - Academic notes with definitions, formulas and examples
  • Meeting Notes (meeting_notes.txt) - Minutes with decisions, action items and pending topics
  • Podcast Summary (podcast_summary.txt) - Accessible summary of multimedia content
  • Prompt Schema (prompt_schema.txt) - General structured analysis
  • Prompt Schema Md (prompt_schema_md.txt) - Detailed Markdown-formatted technical notes

To add a custom mode, create a new .txt file in prompts/. Use [transcription here] as placeholder:

title=Meeting Notes
Analyze this meeting transcript:
1. Key decisions made
2. Action items
3. Follow-up topics

[transcription here]

CLI usage: python src/main.py -l audio.wav -s meeting_notes

Requirements

  • Python 3.10+
  • FFmpeg (for YouTube audio extraction)
  • NVIDIA GPU + CUDA (optional, for faster transcription)
  • 4 GB RAM minimum (more for larger models)

About

WhisperTranscribe is a Python-based project that provides a simple and efficient solution for recording, processing, and transcribing audio using the Faster Whisper model. The tool offers multiple options for capturing audio, including microphone recording, YouTube audio downloading, or loading pre-existing audio files.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors