Conduit Podcast Transcripts

This is an archive of transcriptions generated by NVIDIA Parakeet, Audio Hijack, and related tools meant to be used as a source of data.

The text is from the Conduit Podcast

Getting Started

Installation

This project uses uv for fast, reliable Python package management. To set up:

# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies and create virtual environment
uv sync

Environment Configuration

Load the environment variables:

# Using direnv (recommended)
direnv allow

# Or manually source the .envrc file
source .envrc

The .envrc file should contain connection strings for PostgreSQL and other configuration.

Usage

The project provides multiple interfaces for accessing and managing transcripts:

CLI Tool - conduit command for local operations
REST API - FastAPI server for programmatic access
MCP Server - Model Context Protocol server for Claude integration

Docker Setup

Start all services with Docker Compose:

# Start PostgreSQL and API server
docker compose up -d

# View logs
docker compose logs -f

# Stop services
docker compose down

The API will be available at http://localhost:8000 with interactive docs at /docs.

CLI Usage (via Docker)

docker compose run --rm app python -m cli.main [command] [options]

Transcription

Transcribe episodes from the Conduit website using NVIDIA Parakeet:

# Transcribe the latest episode
docker compose run --rm app python -m cli.main transcribe <episode_number>

# Transcribe and ingest (default)
docker compose run --rm app python -m cli.main transcribe <episode_number>

# Use specific model size/name (default: nvidia/parakeet-rnnt-1.1b)
docker compose run --rm app python -m cli.main transcribe <episode_number> --model nvidia/parakeet-rnnt-1.1b

# Configure the LLM model for RAG (Retrieval Augmented Generation)
docker compose run --rm app -e LLM_MODEL=llama3 python -m cli.main transcribe <episode_number>

Data Ingestion

Load transcripts into PostgreSQL:

# Ingest all files in transcripts directory
docker compose run --rm app python -m cli.main ingest

# Ingest specific file
docker compose run --rm app python -m cli.main ingest --file transcripts/episode1.md

# Recreate tables before ingestion
docker compose run --rm app python -m cli.main ingest --reindex

Search

Search through ingested transcripts:

# Text search (default)
docker compose run --rm app python -m cli.main search "search term"

# Vector semantic search
docker compose run --rm app python -m cli.main search "search phrase" --vector

MCP Server Usage

The project includes a Model Context Protocol (MCP) server that allows AI assistants (like Claude) to directly query the transcript database.

Configuration

Add the server to your MCP client configuration (e.g., claude_desktop_config.json):

{
  "mcpServers": {
    "conduit": {
      "command": "docker",
      "args": [
        "compose",
        "run",
        "--rm",
        "app",
        "python",
        "-m",
        "app.mcp.server"
      ]
    }
  }
}

Or if connecting to a running instance (e.g. via SSE):

{
  "mcpServers": {
    "conduit": {
        "url": "https://conduit.kjaymiller.dev/mcp/sse",
        "transport": "sse"
    }
  }
}

Available Tools

The MCP server provides the following tools:

search_transcripts: Search through transcripts using keyword or vector search.
- query (string): The search text.
- limit (int, optional): Max results (default 10).
- use_vector (bool, optional): Use semantic vector search (default True).
- episode_number (int, optional): Filter by episode number.
get_episode: Retrieve full content and metadata for an episode.
- episode_number (int): The episode number to retrieve.
list_episodes: List available episodes with metadata.
- limit (int, optional): Max results (default 20).
- start_date (string, optional): Filter by start date (YYYY-MM-DD).
- end_date (string, optional): Filter by end date (YYYY-MM-DD).

Management

# Check episode status
docker compose run --rm app python -m cli.main status <episode_number>

# List recent episodes
docker compose run --rm app python -m cli.main list

Project Structure

app/ - FastAPI application
- api/ - REST API endpoints (search, episodes, health)
- mcp/ - MCP Server for Claude integration
- main.py - Main FastAPI application
cli/ - Command-line interface
podcast_transcription/ - Shared library code
- database/ - Database operations (PostgreSQL)
- models/ - SQLAlchemy models
- transcription/ - Transcription logic (NVIDIA Parakeet)
- utils/ - Shared utilities
transcripts/ - Generated markdown files with metadata and transcriptions

Troubleshooting

Virtual environment issues: Run uv sync and ensure you're using uv run or have activated the venv

Missing environment variables: Load .envrc with direnv allow or source .envrc

Parakeet model: First run downloads the default model (nvidia/parakeet-rnnt-1.1b) - requires network access

Database connection: Services are on Aiven; verify credentials and network access

Technology Stack

Python 3.13+
NVIDIA Parakeet (transcription)
LangChain (text processing)
PostgreSQL with pgvector extension
SQLAlchemy (ORM)
Click (CLI framework)
FastAPI (REST API)
MCP (Model Context Protocol)

Usage and License

Conduit Podcast Transcripts by Jay Miller, Kathy Campbell, original downloads from whisper work done by Pilix is licensed under Attribution-NonCommercial-ShareAlike 4.0 International

Name		Name	Last commit message	Last commit date
Latest commit History 193 Commits
.episode_cache		.episode_cache
.github		.github
app		app
cli		cli
infra		infra
podcast_transcription		podcast_transcription
tests		tests
transcripts		transcripts
.dockerignore		.dockerignore
.env.docker.example		.env.docker.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
avn_service_start.sh		avn_service_start.sh
docker-compose.override.yml		docker-compose.override.yml
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh
justfile		justfile
opencode.json		opencode.json
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Conduit Podcast Transcripts

Getting Started

Installation

Environment Configuration

Usage

Docker Setup

CLI Usage (via Docker)

Transcription

Data Ingestion

Search

MCP Server Usage

Configuration

Available Tools

Management

Project Structure

Troubleshooting

Technology Stack

Usage and License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Conduit Podcast Transcripts

Getting Started

Installation

Environment Configuration

Usage

Docker Setup

CLI Usage (via Docker)

Transcription

Data Ingestion

Search

MCP Server Usage

Configuration

Available Tools

Management

Project Structure

Troubleshooting

Technology Stack

Usage and License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages