A tool that takes an MP3 podcast file, transcribes it using OpenAI's Whisper (locally), summarizes the content using Google Gemini, and uploads the result to Google Docs.
The tool is available both as a command-line application and a user-friendly GUI.
Two versions of the tool are available:
-
Standard Version (
podcast_processor.pyandpodcast_processor_gui.py)- Basic functionality for podcast processing
-
Improved Version (
podcast_processor_improved.pyandpodcast_processor_gui_improved.py)- Better error handling
- Progress tracking
- Support for long podcasts (chunking)
- Improved Google Docs formatting
- Thread-safe GUI with cancellation support
- Enhanced file management
- Extracts metadata from MP3 files (podcast name, episode title, publication date)
- Transcribes audio using Whisper (runs locally on your Mac)
- Incorporates your personal notes about the podcast
- Summarizes content using Google Gemini AI
- Creates a structured document with:
- Overview of the podcast
- Detailed breakdown of key topics
- Special focus on topics mentioned in your notes
- Automatically uploads to Google Docs
- Organizes all files in a consistent directory structure
- Python 3.8+
- FFmpeg (required for Whisper)
- Google Gemini API key
- Google Cloud project with Docs and Drive APIs enabled
-
Install FFmpeg (required for Whisper):
brew install ffmpeg
-
Install the required Python packages:
pip install -r requirements.txt
-
Set up Google Cloud credentials:
- Create a project in the Google Cloud Console
- Enable the Google Docs API and Google Drive API
- Create OAuth credentials (Desktop application)
- Download the credentials JSON file and save it as
credentials.jsonin the same directory as the script
Basic usage:
python podcast_processor.py /path/to/podcast.mp3With notes:
python podcast_processor.py /path/to/podcast.mp3 --notes /path/to/notes.txtFull options:
python podcast_processor.py /path/to/podcast.mp3 \
--notes /path/to/notes.txt \
--output-dir /path/to/output \
--gemini-api-key YOUR_GEMINI_API_KEY \
--model-size mediumTo use the GUI version:
python podcast_processor_gui.pyThe GUI provides a user-friendly interface with:
- File selection dialogs for MP3 and notes files
- Settings configuration
- Progress tracking
- Direct access to the Google Docs result
You can set the Gemini API key as an environment variable:
export GEMINI_API_KEY=your_api_key_hereAvailable model sizes for Whisper (smaller is faster, larger is more accurate):
- tiny
- base (default)
- small
- medium
- large
The tool creates a directory structure with:
- Transcript file
- Summary file (Markdown)
- Processing information (JSON)
- Google Docs link
On first run, the tool will open a browser window for Google authentication. After authenticating, it will save a token file for future use.
MIT