AI-Powered Podcast Fact-Checking Pipeline
PodBuster is an intelligent multi-agent system that automatically extracts and verifies factual claims from podcast transcripts using Google's Agent Development Kit (ADK) and Gemini models.
- Multi-Agent Architecture: Sequential pipeline with specialized agents for different tasks
- Claim Extraction: Automatically identifies verifiable factual claims from transcripts
- Real-Time Verification: Uses Google Search for grounding and fact-checking
- Structured Output: JSON schemas ensure consistent, parseable results
- Professional Reporting: Generates Markdown tables with verdicts and citations
- Opinion Filtering: Intelligently separates facts from opinions and conversational phrases
PodBuster uses a three-agent sequential pipeline:
βββββββββββββββββββ
β Raw Transcript β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β TranscriberAgentβ Extract verifiable claims
β (Gemini 2.5) β Filter opinions/questions
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β FactCheckerAgentβ Verify each claim
β (Gemini 2.5) β Use Google Search
ββββββββββ¬βββββββββ Provide citations
β
βΌ
βββββββββββββββββββ
β ReporterAgent β Generate Markdown report
β (Gemini 2.5) β Format results table
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Final Report β
βββββββββββββββββββ
-
TranscriberAgent
- Extracts explicit, verifiable factual claims
- Filters out opinions, emotions, questions
- Outputs: JSON array of claim strings
-
FactCheckerAgent
- Verifies claims using Google Search
- Assigns status: TRUE, FALSE, or INCONCLUSIVE
- Provides summary and authoritative source citation
- Outputs: JSON object with status, summary, citation
-
ReporterAgent
- Compiles all verification results
- Generates professional Markdown table
- Outputs: Formatted accountability report
- Python 3.10+
- Google API Key with Gemini access
- VS Code (recommended) or Jupyter Notebook
cd /path/to/podbusterpython -m venv .venv
source .venv/bin/activate # On macOS/Linux
# or
.venv\Scripts\activate # On Windowspip install google-adk litellm python-dotenvCreate a .env file in the project root:
GOOGLE_API_KEY=your_google_api_key_hereOpen podbuster.ipynb in VS Code or Jupyter and run all cells.
podbuster/
βββ .env # API keys (not in git)
βββ .gitignore # Git ignore rules
βββ .gitattributes # Git attributes
βββ README.md # This file
βββ podbuster.ipynb # Main pipeline notebook
βββ podbuster_data.py # Mock transcript data
βββ .venv/ # Virtual environment (not in git)
The project includes 4 pre-built mock transcripts for testing:
- MOCK_TRANSCRIPT_TECH: AI & Technology (ChatGPT, neurons, quantum computing)
- MOCK_TRANSCRIPT_SCIENCE: Space & Astronomy (Moon, Mars, Sun facts)
- MOCK_TRANSCRIPT_HEALTH: Nutrition & Medicine (water, bones, vitamins)
- MOCK_TRANSCRIPT_HISTORY: Historical Events (Declaration, WWII, Great Wall)
Each transcript contains a mix of:
- β True facts (verifiable and correct)
- β False claims (verifiable but incorrect)
- π Opinions (should be filtered out by TranscriberAgent)
# In the notebook, after running all setup cells:
response = await podbuster_runner.run_debug(MOCK_TRANSCRIPT)from podbuster_data import MOCK_TRANSCRIPT_TECH
response = await podbuster_runner.run_debug(MOCK_TRANSCRIPT_TECH)custom_transcript = "Your custom podcast transcript here..."
response = await podbuster_runner.run_debug(
f"Analyze this podcast transcript: {custom_transcript}"
)The pipeline includes automatic retry logic for API failures:
retry_config = types.HttpRetryOptions(
attempts=5,
exp_base=7,
initial_delay=1,
http_status_codes=[429, 500, 503, 504]
)- TranscriberAgent:
gemini-2.5-flash-lite(fast claim extraction) - FactCheckerAgent:
gemini-2.5-flash-lite(thorough verification) - ReporterAgent:
gemini-2.5-flash-lite(report generation)
The pipeline generates a Markdown table with:
| Claim | Status | Summary | Citation |
|---|---|---|---|
| "ChatGPT was released in November 2022" | TRUE | Verified via official OpenAI announcements | https://openai.com/... |
| "Quantum computers can solve any problem instantly" | FALSE | Quantum computers have limitations... | https://... |
Make sure all dependencies are installed:
pip install google-adk litellm python-dotenvVerify your .env file contains a valid Google API key:
cat .env # Should show: GOOGLE_API_KEY=...- Use
MOCK_TRANSCRIPTdirectly, not{MOCK_TRANSCRIPT}in strings {variable}syntax is for ADK context variables, not Python variables
This is a Kaggle competition project. Feel free to fork and experiment!
MIT
- Google Agent Development Kit (ADK)
- Gemini API
- LiteLLM
- Python 3.10+
- The pipeline uses in-memory sessions (no persistence)
- Google Search tool requires active internet connection
- Fact-checking accuracy depends on search result quality
- Results may vary based on current web information