Skip to content

ruthiel/podbuster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ™οΈ PodBuster

AI-Powered Podcast Fact-Checking Pipeline

PodBuster is an intelligent multi-agent system that automatically extracts and verifies factual claims from podcast transcripts using Google's Agent Development Kit (ADK) and Gemini models.

🌟 Features

  • Multi-Agent Architecture: Sequential pipeline with specialized agents for different tasks
  • Claim Extraction: Automatically identifies verifiable factual claims from transcripts
  • Real-Time Verification: Uses Google Search for grounding and fact-checking
  • Structured Output: JSON schemas ensure consistent, parseable results
  • Professional Reporting: Generates Markdown tables with verdicts and citations
  • Opinion Filtering: Intelligently separates facts from opinions and conversational phrases

πŸ—οΈ Architecture

PodBuster uses a three-agent sequential pipeline:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Raw Transcript β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ TranscriberAgentβ”‚  Extract verifiable claims
β”‚  (Gemini 2.5)   β”‚  Filter opinions/questions
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ FactCheckerAgentβ”‚  Verify each claim
β”‚  (Gemini 2.5)   β”‚  Use Google Search
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜  Provide citations
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ ReporterAgent   β”‚  Generate Markdown report
β”‚  (Gemini 2.5)   β”‚  Format results table
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Final Report   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Agent Details

  1. TranscriberAgent

    • Extracts explicit, verifiable factual claims
    • Filters out opinions, emotions, questions
    • Outputs: JSON array of claim strings
  2. FactCheckerAgent

    • Verifies claims using Google Search
    • Assigns status: TRUE, FALSE, or INCONCLUSIVE
    • Provides summary and authoritative source citation
    • Outputs: JSON object with status, summary, citation
  3. ReporterAgent

    • Compiles all verification results
    • Generates professional Markdown table
    • Outputs: Formatted accountability report

πŸ“‹ Prerequisites

  • Python 3.10+
  • Google API Key with Gemini access
  • VS Code (recommended) or Jupyter Notebook

πŸš€ Setup

1. Clone and Navigate

cd /path/to/podbuster

2. Create Virtual Environment

python -m venv .venv
source .venv/bin/activate  # On macOS/Linux
# or
.venv\Scripts\activate     # On Windows

3. Install Dependencies

pip install google-adk litellm python-dotenv

4. Configure Environment

Create a .env file in the project root:

GOOGLE_API_KEY=your_google_api_key_here

5. Run the Notebook

Open podbuster.ipynb in VS Code or Jupyter and run all cells.

πŸ“ Project Structure

podbuster/
β”œβ”€β”€ .env                    # API keys (not in git)
β”œβ”€β”€ .gitignore             # Git ignore rules
β”œβ”€β”€ .gitattributes         # Git attributes
β”œβ”€β”€ README.md              # This file
β”œβ”€β”€ podbuster.ipynb        # Main pipeline notebook
β”œβ”€β”€ podbuster_data.py      # Mock transcript data
└── .venv/                 # Virtual environment (not in git)

πŸ§ͺ Mock Data

The project includes 4 pre-built mock transcripts for testing:

  • MOCK_TRANSCRIPT_TECH: AI & Technology (ChatGPT, neurons, quantum computing)
  • MOCK_TRANSCRIPT_SCIENCE: Space & Astronomy (Moon, Mars, Sun facts)
  • MOCK_TRANSCRIPT_HEALTH: Nutrition & Medicine (water, bones, vitamins)
  • MOCK_TRANSCRIPT_HISTORY: Historical Events (Declaration, WWII, Great Wall)

Each transcript contains a mix of:

  • βœ… True facts (verifiable and correct)
  • ❌ False claims (verifiable but incorrect)
  • πŸ’­ Opinions (should be filtered out by TranscriberAgent)

πŸ’» Usage

Basic Usage

# In the notebook, after running all setup cells:
response = await podbuster_runner.run_debug(MOCK_TRANSCRIPT)

Custom Transcript

from podbuster_data import MOCK_TRANSCRIPT_TECH

response = await podbuster_runner.run_debug(MOCK_TRANSCRIPT_TECH)

With f-string

custom_transcript = "Your custom podcast transcript here..."
response = await podbuster_runner.run_debug(
    f"Analyze this podcast transcript: {custom_transcript}"
)

πŸ”§ Configuration

Retry Configuration

The pipeline includes automatic retry logic for API failures:

retry_config = types.HttpRetryOptions(
    attempts=5,
    exp_base=7,
    initial_delay=1,
    http_status_codes=[429, 500, 503, 504]
)

Model Configuration

  • TranscriberAgent: gemini-2.5-flash-lite (fast claim extraction)
  • FactCheckerAgent: gemini-2.5-flash-lite (thorough verification)
  • ReporterAgent: gemini-2.5-flash-lite (report generation)

πŸ“Š Output Format

The pipeline generates a Markdown table with:

Claim Status Summary Citation
"ChatGPT was released in November 2022" TRUE Verified via official OpenAI announcements https://openai.com/...
"Quantum computers can solve any problem instantly" FALSE Quantum computers have limitations... https://...

πŸ› οΈ Troubleshooting

Import Errors

Make sure all dependencies are installed:

pip install google-adk litellm python-dotenv

API Key Issues

Verify your .env file contains a valid Google API key:

cat .env  # Should show: GOOGLE_API_KEY=...

Context Variable Errors

  • Use MOCK_TRANSCRIPT directly, not {MOCK_TRANSCRIPT} in strings
  • {variable} syntax is for ADK context variables, not Python variables

🀝 Contributing

This is a Kaggle competition project. Feel free to fork and experiment!

πŸ“„ License

MIT

πŸ”— Technologies

πŸ“ Notes

  • The pipeline uses in-memory sessions (no persistence)
  • Google Search tool requires active internet connection
  • Fact-checking accuracy depends on search result quality
  • Results may vary based on current web information

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published