Conduit Podcast Transcripts

This is an archive of transcriptions generated by Whisper, Audio Hijack, and related tools meant to be used as a source of data.

The text is from the Conduit Podcast

Getting Started

Installation

This project uses uv for fast, reliable Python package management. To set up:

# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies and create virtual environment
uv sync

### Environment Configuration

Load the environment variables:

```bash
# Using direnv (recommended)
direnv allow

# Or manually source the .envrc file
source .envrc

The .envrc file should contain connection strings for OpenSearch, PostgreSQL, and other configuration.

Usage

All commands below use uv run for environment isolation. If you've activated the virtual environment, you can omit uv run.

Transcription

Transcribe episodes from the Conduit website using OpenAI Whisper:

# Transcribe the latest episode
uv run python src/transcribe.py ep

# Transcribe specific episodes
uv run python src/transcribe.py ep 100 101 102

# Transcribe a range of episodes
uv run python src/transcribe.py ep --range 100-105

# Transcribe all episodes (with confirmation)
uv run python src/transcribe.py ep --all

# Transcribe a local audio file
uv run python src/transcribe.py file path/to/audio.mp3 --output path/to/output.txt

Data Ingestion

Load transcripts into PostgreSQL and/or OpenSearch:

# Load all transcripts into both databases
uv run python src/quick_upload.py files

# Load specific files
uv run python src/quick_upload.py files --file transcripts/episode1.md --file transcripts/episode2.md

# Load with index recreation (destroys existing OpenSearch index)
uv run python src/quick_upload.py files --reindex

# Load to PostgreSQL only
uv run python src/quick_upload.py files --pg-only

# Load to OpenSearch only
uv run python src/quick_upload.py files --os-only

Index Management

# Create or recreate OpenSearch index
uv run python src/os_index.py

Project Structure

src/ - Application code
- transcribe.py - Whisper transcription and episode processing
- url_finder.py - Web scraping for episode metadata and audio URLs
- os_ingest.py - OpenSearch data ingestion
- os_index.py - OpenSearch index creation
- pg_ingest.py - PostgreSQL data processing with embeddings
- quick_upload.py - Unified data loader for both databases
- download_audio_file.py - Audio file download utility
transcripts/ - Generated markdown files with metadata and transcriptions

Troubleshooting

Virtual environment issues: Run uv sync and ensure you're using uv run or have activated the venv

Missing environment variables: Load .envrc with direnv allow or source .envrc

Whisper model: First run downloads the "base" model (~140MB) - requires network access

Database connection: Services are on Aiven; verify credentials and network access

Table recreation: pg_ingest.py and quick_upload.py can drop tables - use --reindex carefully

Technology Stack

Python 3.12.5
OpenAI Whisper (transcription)
LangChain (text processing)
PostgreSQL with pgvector extension
OpenSearch (vector search)
SQLAlchemy (ORM)
Typer (CLI framework)

Usage and License

Conduit Podcast Transcripts by Jay Miller, Kathy Campbell, original downloads from whisper work done by Pilix is licensed under Attribution-NonCommercial-ShareAlike 4.0 International

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
src		src
transcripts		transcripts
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
avn_service_start.sh		avn_service_start.sh
justfile		justfile
license.md		license.md
pyproject.toml		pyproject.toml
readme.md		readme.md
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Conduit Podcast Transcripts

Getting Started

Installation

Usage

Transcription

Data Ingestion

Index Management

Project Structure

Troubleshooting

Technology Stack

Usage and License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Conduit Podcast Transcripts

Getting Started

Installation

Usage

Transcription

Data Ingestion

Index Management

Project Structure

Troubleshooting

Technology Stack

Usage and License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages