🎙️ PodBuster

AI-Powered Podcast Fact-Checking Pipeline

PodBuster is an intelligent multi-agent system that automatically extracts and verifies factual claims from podcast transcripts using Google's Agent Development Kit (ADK) and Gemini models.

🌟 Features

Multi-Agent Architecture: Sequential pipeline with specialized agents for different tasks
Claim Extraction: Automatically identifies verifiable factual claims from transcripts
Real-Time Verification: Uses Google Search for grounding and fact-checking
Structured Output: JSON schemas ensure consistent, parseable results
Professional Reporting: Generates Markdown tables with verdicts and citations
Opinion Filtering: Intelligently separates facts from opinions and conversational phrases

🏗️ Architecture

PodBuster uses a three-agent sequential pipeline:

┌─────────────────┐
│  Raw Transcript │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ TranscriberAgent│  Extract verifiable claims
│  (Gemini 2.5)   │  Filter opinions/questions
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│ FactCheckerAgent│  Verify each claim
│  (Gemini 2.5)   │  Use Google Search
└────────┬────────┘  Provide citations
         │
         ▼
┌─────────────────┐
│ ReporterAgent   │  Generate Markdown report
│  (Gemini 2.5)   │  Format results table
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Final Report   │
└─────────────────┘

Agent Details

TranscriberAgent
- Extracts explicit, verifiable factual claims
- Filters out opinions, emotions, questions
- Outputs: JSON array of claim strings
FactCheckerAgent
- Verifies claims using Google Search
- Assigns status: TRUE, FALSE, or INCONCLUSIVE
- Provides summary and authoritative source citation
- Outputs: JSON object with status, summary, citation
ReporterAgent
- Compiles all verification results
- Generates professional Markdown table
- Outputs: Formatted accountability report

📋 Prerequisites

Python 3.10+
Google API Key with Gemini access
VS Code (recommended) or Jupyter Notebook

🚀 Setup

1. Clone and Navigate

cd /path/to/podbuster

2. Create Virtual Environment

python -m venv .venv
source .venv/bin/activate  # On macOS/Linux
# or
.venv\Scripts\activate     # On Windows

3. Install Dependencies

pip install google-adk litellm python-dotenv

4. Configure Environment

Create a .env file in the project root:

GOOGLE_API_KEY=your_google_api_key_here

5. Run the Notebook

Open podbuster.ipynb in VS Code or Jupyter and run all cells.

📁 Project Structure

podbuster/
├── .env                    # API keys (not in git)
├── .gitignore             # Git ignore rules
├── .gitattributes         # Git attributes
├── README.md              # This file
├── podbuster.ipynb        # Main pipeline notebook
├── podbuster_data.py      # Mock transcript data
└── .venv/                 # Virtual environment (not in git)

🧪 Mock Data

The project includes 4 pre-built mock transcripts for testing:

MOCK_TRANSCRIPT_TECH: AI & Technology (ChatGPT, neurons, quantum computing)
MOCK_TRANSCRIPT_SCIENCE: Space & Astronomy (Moon, Mars, Sun facts)
MOCK_TRANSCRIPT_HEALTH: Nutrition & Medicine (water, bones, vitamins)
MOCK_TRANSCRIPT_HISTORY: Historical Events (Declaration, WWII, Great Wall)

Each transcript contains a mix of:

✅ True facts (verifiable and correct)
❌ False claims (verifiable but incorrect)
💭 Opinions (should be filtered out by TranscriberAgent)

💻 Usage

Basic Usage

# In the notebook, after running all setup cells:
response = await podbuster_runner.run_debug(MOCK_TRANSCRIPT)

Custom Transcript

from podbuster_data import MOCK_TRANSCRIPT_TECH

response = await podbuster_runner.run_debug(MOCK_TRANSCRIPT_TECH)

With f-string

custom_transcript = "Your custom podcast transcript here..."
response = await podbuster_runner.run_debug(
    f"Analyze this podcast transcript: {custom_transcript}"
)

🔧 Configuration

Retry Configuration

The pipeline includes automatic retry logic for API failures:

retry_config = types.HttpRetryOptions(
    attempts=5,
    exp_base=7,
    initial_delay=1,
    http_status_codes=[429, 500, 503, 504]
)

Model Configuration

TranscriberAgent: gemini-2.5-flash-lite (fast claim extraction)
FactCheckerAgent: gemini-2.5-flash-lite (thorough verification)
ReporterAgent: gemini-2.5-flash-lite (report generation)

📊 Output Format

The pipeline generates a Markdown table with:

Claim	Status	Summary	Citation
"ChatGPT was released in November 2022"	TRUE	Verified via official OpenAI announcements	https://openai.com/...
"Quantum computers can solve any problem instantly"	FALSE	Quantum computers have limitations...	https://...

🛠️ Troubleshooting

Import Errors

Make sure all dependencies are installed:

pip install google-adk litellm python-dotenv

API Key Issues

Verify your .env file contains a valid Google API key:

cat .env  # Should show: GOOGLE_API_KEY=...

Context Variable Errors

Use MOCK_TRANSCRIPT directly, not {MOCK_TRANSCRIPT} in strings
{variable} syntax is for ADK context variables, not Python variables

🤝 Contributing

This is a Kaggle competition project. Feel free to fork and experiment!

📄 License

MIT

🔗 Technologies

📝 Notes

The pipeline uses in-memory sessions (no persistence)
Google Search tool requires active internet connection
Fact-checking accuracy depends on search result quality
Results may vary based on current web information

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ PodBuster

🌟 Features

🏗️ Architecture

Agent Details

📋 Prerequisites

🚀 Setup

1. Clone and Navigate

2. Create Virtual Environment

3. Install Dependencies

4. Configure Environment

5. Run the Notebook

📁 Project Structure

🧪 Mock Data

💻 Usage

Basic Usage

Custom Transcript

With f-string

🔧 Configuration

Retry Configuration

Model Configuration

📊 Output Format

🛠️ Troubleshooting

Import Errors

API Key Issues

Context Variable Errors

🤝 Contributing

📄 License

🔗 Technologies

📝 Notes

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
podbuster.ipynb		podbuster.ipynb
podbuster_data.py		podbuster_data.py

ruthiel/podbuster

Folders and files

Latest commit

History

Repository files navigation

🎙️ PodBuster

🌟 Features

🏗️ Architecture

Agent Details

📋 Prerequisites

🚀 Setup

1. Clone and Navigate

2. Create Virtual Environment

3. Install Dependencies

4. Configure Environment

5. Run the Notebook

📁 Project Structure

🧪 Mock Data

💻 Usage

Basic Usage

Custom Transcript

With f-string

🔧 Configuration

Retry Configuration

Model Configuration

📊 Output Format

🛠️ Troubleshooting

Import Errors

API Key Issues

Context Variable Errors

🤝 Contributing

📄 License

🔗 Technologies

📝 Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages