Extract text and transcribe audio from PowerPoint presentations using OpenAI Whisper.
- 📝 Text Extraction: Extracts all text content from PowerPoint slides
- 🎤 Audio Transcription: Uses OpenAI Whisper to transcribe embedded audio files (WAV, MP3, M4A)
- ⚡ GPU Acceleration: Automatic CUDA detection with CPU fallback
- 🎯 Multiple Models: Supports various Whisper model sizes (tiny, base, small, medium, large)
- 📊 Progress Tracking: Real-time progress bars during processing
- 🔧 Configurable: Easy-to-modify settings for performance and quality tuning
- Python 3.8 or higher
- CUDA-compatible GPU (optional, but recommended for faster processing)
git clone https://github.com/sankeer28/pptx-text-audio-transcriber.git
cd pptx-text-audio-transcriberpython -m venv venv
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activatepip install -r requirements.txtFor GPU acceleration, install CUDA and compatible PyTorch:
-
Install CUDA Toolkit:
- Download from NVIDIA CUDA Toolkit
- Choose version 11.8 or 12.1 (recommended)
- Follow the installer instructions
-
Install cuDNN:
- Download from NVIDIA cuDNN
- Extract and copy files to CUDA installation directory
-
Install CUDA-enabled PyTorch:
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118
# For CUDA 11.8
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118
# For CUDA 12.1
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121Test CUDA availability:
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
print(f"GPU: {torch.cuda.get_device_name(0)}")- Create a
presentationsfolder in the project directory - Place your PowerPoint files (.pptx) in the
presentationsfolder
python main.py- Extracted content will be saved in the
outputfolder - Each PowerPoint file generates a corresponding
.txtfile with:- All slide text content
- Transcribed audio content
- Processing metadata
Edit the configuration settings at the top of main.py:
PPTX_FOLDER = "presentations" # Input folder
OUTPUT_FOLDER = "output" # Output folderWHISPER_MODEL = "base" # Options: "tiny", "base", "small", "medium", "large"
FORCE_LANGUAGE = "en" # Force language ("en", "es", "fr", etc.) or None for auto-detectFORCE_DEVICE = None # Options: None (auto), "cpu", "cuda"
USE_HALF_PRECISION = False # Enable fp16 for 30-50% speed boost (GPU only)
GPU_BEST_OF = 3 # Higher = more accurate, slower
GPU_BEAM_SIZE = 3 # Beam search sizeTEMPERATURE = 0.0 # 0.0 = deterministic, 0.1-1.0 = more creative
ENABLE_WORD_TIMESTAMPS = True # Get word-level timing data| Model | Size | Speed | Quality | VRAM Usage |
|---|---|---|---|---|
| tiny | 39MB | Fastest | Good | ~1GB |
| base | 74MB | Fast | Better | ~1GB |
| small | 244MB | Medium | Good | ~2GB |
| medium | 769MB | Slow | Very Good | ~5GB |
| large | 1550MB | Slowest | Best | ~10GB |
1. CUDA Out of Memory
- Use a smaller Whisper model (
tinyorbase) - Set
USE_HALF_PRECISION = True - Reduce
GPU_BEST_OFandGPU_BEAM_SIZE
2. No Audio Files Found
- Ensure audio is embedded in PowerPoint (not linked)
- Supported formats: WAV, MP3, M4A
3. Installation Issues
- Ensure Python 3.8+ is installed
- Try installing dependencies one by one
- Use virtual environment to avoid conflicts
4. Poor Transcription Quality
- Use larger Whisper model (
mediumorlarge) - Set correct language with
FORCE_LANGUAGE - Increase
GPU_BEST_OFfor better accuracy
- GPU Users: Use
baseorsmallmodels for best speed/quality balance - CPU Users: Stick with
tinyorbasemodels - Large Files: Process in batches to avoid memory issues
- Quality Focus: Use
mediumorlargemodels with higher beam size