This is an archive of transcriptions generated by NVIDIA Parakeet, Audio Hijack, and related tools meant to be used as a source of data.
The text is from the Conduit Podcast
This project uses uv for fast, reliable Python package management. To set up:
# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install dependencies and create virtual environment
uv syncLoad the environment variables:
# Using direnv (recommended)
direnv allow
# Or manually source the .envrc file
source .envrcThe .envrc file should contain connection strings for PostgreSQL and other configuration.
The project provides multiple interfaces for accessing and managing transcripts:
- CLI Tool -
conduitcommand for local operations - REST API - FastAPI server for programmatic access
- MCP Server - Model Context Protocol server for Claude integration
Start all services with Docker Compose:
# Start PostgreSQL and API server
docker compose up -d
# View logs
docker compose logs -f
# Stop services
docker compose downThe API will be available at http://localhost:8000 with interactive docs at /docs.
docker compose run --rm app python -m cli.main [command] [options]Transcribe episodes from the Conduit website using NVIDIA Parakeet:
# Transcribe the latest episode
docker compose run --rm app python -m cli.main transcribe <episode_number>
# Transcribe and ingest (default)
docker compose run --rm app python -m cli.main transcribe <episode_number>
# Use specific model size/name (default: nvidia/parakeet-rnnt-1.1b)
docker compose run --rm app python -m cli.main transcribe <episode_number> --model nvidia/parakeet-rnnt-1.1b
# Configure the LLM model for RAG (Retrieval Augmented Generation)
docker compose run --rm app -e LLM_MODEL=llama3 python -m cli.main transcribe <episode_number>Load transcripts into PostgreSQL:
# Ingest all files in transcripts directory
docker compose run --rm app python -m cli.main ingest
# Ingest specific file
docker compose run --rm app python -m cli.main ingest --file transcripts/episode1.md
# Recreate tables before ingestion
docker compose run --rm app python -m cli.main ingest --reindexSearch through ingested transcripts:
# Text search (default)
docker compose run --rm app python -m cli.main search "search term"
# Vector semantic search
docker compose run --rm app python -m cli.main search "search phrase" --vectorThe project includes a Model Context Protocol (MCP) server that allows AI assistants (like Claude) to directly query the transcript database.
Add the server to your MCP client configuration (e.g., claude_desktop_config.json):
{
"mcpServers": {
"conduit": {
"command": "docker",
"args": [
"compose",
"run",
"--rm",
"app",
"python",
"-m",
"app.mcp.server"
]
}
}
}Or if connecting to a running instance (e.g. via SSE):
{
"mcpServers": {
"conduit": {
"url": "https://conduit.kjaymiller.dev/mcp/sse",
"transport": "sse"
}
}
}The MCP server provides the following tools:
-
search_transcripts: Search through transcripts using keyword or vector search.query(string): The search text.limit(int, optional): Max results (default 10).use_vector(bool, optional): Use semantic vector search (default True).episode_number(int, optional): Filter by episode number.
-
get_episode: Retrieve full content and metadata for an episode.episode_number(int): The episode number to retrieve.
-
list_episodes: List available episodes with metadata.limit(int, optional): Max results (default 20).start_date(string, optional): Filter by start date (YYYY-MM-DD).end_date(string, optional): Filter by end date (YYYY-MM-DD).
# Check episode status
docker compose run --rm app python -m cli.main status <episode_number>
# List recent episodes
docker compose run --rm app python -m cli.main listapp/- FastAPI applicationapi/- REST API endpoints (search, episodes, health)mcp/- MCP Server for Claude integrationmain.py- Main FastAPI application
cli/- Command-line interfacepodcast_transcription/- Shared library codedatabase/- Database operations (PostgreSQL)models/- SQLAlchemy modelstranscription/- Transcription logic (NVIDIA Parakeet)utils/- Shared utilities
transcripts/- Generated markdown files with metadata and transcriptions
Virtual environment issues: Run uv sync and ensure you're using uv run or have activated the venv
Missing environment variables: Load .envrc with direnv allow or source .envrc
Parakeet model: First run downloads the default model (nvidia/parakeet-rnnt-1.1b) - requires network access
Database connection: Services are on Aiven; verify credentials and network access
- Python 3.13+
- NVIDIA Parakeet (transcription)
- LangChain (text processing)
- PostgreSQL with pgvector extension
- SQLAlchemy (ORM)
- Click (CLI framework)
- FastAPI (REST API)
- MCP (Model Context Protocol)
Conduit Podcast Transcripts by Jay Miller, Kathy Campbell, original downloads from whisper work done by Pilix is licensed under Attribution-NonCommercial-ShareAlike 4.0 International