Skip to content

jjroberts88/podcast_summariser

Repository files navigation

Podcast Processor

A tool that takes an MP3 podcast file, transcribes it using OpenAI's Whisper (locally), summarizes the content using Google Gemini, and uploads the result to Google Docs.

The tool is available both as a command-line application and a user-friendly GUI.

Versions

Two versions of the tool are available:

  1. Standard Version (podcast_processor.py and podcast_processor_gui.py)

    • Basic functionality for podcast processing
  2. Improved Version (podcast_processor_improved.py and podcast_processor_gui_improved.py)

    • Better error handling
    • Progress tracking
    • Support for long podcasts (chunking)
    • Improved Google Docs formatting
    • Thread-safe GUI with cancellation support
    • Enhanced file management

Features

  • Extracts metadata from MP3 files (podcast name, episode title, publication date)
  • Transcribes audio using Whisper (runs locally on your Mac)
  • Incorporates your personal notes about the podcast
  • Summarizes content using Google Gemini AI
  • Creates a structured document with:
    • Overview of the podcast
    • Detailed breakdown of key topics
    • Special focus on topics mentioned in your notes
  • Automatically uploads to Google Docs
  • Organizes all files in a consistent directory structure

Requirements

  • Python 3.8+
  • FFmpeg (required for Whisper)
  • Google Gemini API key
  • Google Cloud project with Docs and Drive APIs enabled

Installation

  1. Install FFmpeg (required for Whisper):

    brew install ffmpeg
  2. Install the required Python packages:

    pip install -r requirements.txt
  3. Set up Google Cloud credentials:

    • Create a project in the Google Cloud Console
    • Enable the Google Docs API and Google Drive API
    • Create OAuth credentials (Desktop application)
    • Download the credentials JSON file and save it as credentials.json in the same directory as the script

Usage

Command Line Interface

Basic usage:

python podcast_processor.py /path/to/podcast.mp3

With notes:

python podcast_processor.py /path/to/podcast.mp3 --notes /path/to/notes.txt

Full options:

python podcast_processor.py /path/to/podcast.mp3 \
  --notes /path/to/notes.txt \
  --output-dir /path/to/output \
  --gemini-api-key YOUR_GEMINI_API_KEY \
  --model-size medium

Graphical User Interface

To use the GUI version:

python podcast_processor_gui.py

The GUI provides a user-friendly interface with:

  • File selection dialogs for MP3 and notes files
  • Settings configuration
  • Progress tracking
  • Direct access to the Google Docs result

Environment Variables

You can set the Gemini API key as an environment variable:

export GEMINI_API_KEY=your_api_key_here

Whisper Model Sizes

Available model sizes for Whisper (smaller is faster, larger is more accurate):

  • tiny
  • base (default)
  • small
  • medium
  • large

Output

The tool creates a directory structure with:

  • Transcript file
  • Summary file (Markdown)
  • Processing information (JSON)
  • Google Docs link

Authentication

On first run, the tool will open a browser window for Google authentication. After authenticating, it will save a token file for future use.

License

MIT

About

A tool that takes an MP3 podcast file, transcribes it using OpenAI's Whisper (locally), summarizes the content using Google Gemini, and uploads the result to Google Docs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors