This is an archive of transcriptions generated by Whisper, Audio Hijack, and related tools meant to be used as a source of data.
The text is from the Conduit Podcast
This project uses uv for fast, reliable Python package management. To set up:
# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install dependencies and create virtual environment
uv sync
### Environment Configuration
Load the environment variables:
```bash
# Using direnv (recommended)
direnv allow
# Or manually source the .envrc file
source .envrcThe .envrc file should contain connection strings for OpenSearch, PostgreSQL, and other configuration.
All commands below use uv run for environment isolation. If you've activated the virtual environment, you can omit uv run.
Transcribe episodes from the Conduit website using OpenAI Whisper:
# Transcribe the latest episode
uv run python src/transcribe.py ep
# Transcribe specific episodes
uv run python src/transcribe.py ep 100 101 102
# Transcribe a range of episodes
uv run python src/transcribe.py ep --range 100-105
# Transcribe all episodes (with confirmation)
uv run python src/transcribe.py ep --all
# Transcribe a local audio file
uv run python src/transcribe.py file path/to/audio.mp3 --output path/to/output.txtLoad transcripts into PostgreSQL and/or OpenSearch:
# Load all transcripts into both databases
uv run python src/quick_upload.py files
# Load specific files
uv run python src/quick_upload.py files --file transcripts/episode1.md --file transcripts/episode2.md
# Load with index recreation (destroys existing OpenSearch index)
uv run python src/quick_upload.py files --reindex
# Load to PostgreSQL only
uv run python src/quick_upload.py files --pg-only
# Load to OpenSearch only
uv run python src/quick_upload.py files --os-only# Create or recreate OpenSearch index
uv run python src/os_index.pysrc/- Application codetranscribe.py- Whisper transcription and episode processingurl_finder.py- Web scraping for episode metadata and audio URLsos_ingest.py- OpenSearch data ingestionos_index.py- OpenSearch index creationpg_ingest.py- PostgreSQL data processing with embeddingsquick_upload.py- Unified data loader for both databasesdownload_audio_file.py- Audio file download utility
transcripts/- Generated markdown files with metadata and transcriptions
Virtual environment issues: Run uv sync and ensure you're using uv run or have activated the venv
Missing environment variables: Load .envrc with direnv allow or source .envrc
Whisper model: First run downloads the "base" model (~140MB) - requires network access
Database connection: Services are on Aiven; verify credentials and network access
Table recreation: pg_ingest.py and quick_upload.py can drop tables - use --reindex carefully
- Python 3.12.5
- OpenAI Whisper (transcription)
- LangChain (text processing)
- PostgreSQL with pgvector extension
- OpenSearch (vector search)
- SQLAlchemy (ORM)
- Typer (CLI framework)
Conduit Podcast Transcripts by Jay Miller, Kathy Campbell, original downloads from whisper work done by Pilix is licensed under Attribution-NonCommercial-ShareAlike 4.0 International