A local-first RAG system for talking to long-form content. Point it at YouTube playlists, podcasts, or blogs, and it ingests, chunks, indexes, and lets you have real conversations about the material — all running against a local LLM by default.
I kept saving hours of video and podcast content I never had time to revisit. Existing tools either shipped my data to a cloud provider or locked me into one ecosystem, so I built something that runs entirely on my own machine and lets me ask "what did this guy say about magnesium?" instead of scrubbing through a 3-hour episode.
- Python 3 · FastAPI · WebSockets
- LM Studio (default) for local inference, with OpenAI as an optional drop-in
- Custom chunking + retrieval pipeline (no heavy vector DB dependency)
- pdfplumber / PyPDF2 for document ingest
- youtube-transcript-api for video sources
- Pulls transcripts from a YouTube channel, playlist, or arbitrary podcast/blog source
- Chunks content with overlap, runs an extraction pass to pull out the substantive claims, and builds a lightweight retrievable index
- Serves a chat UI over WebSockets so you can ask questions and get answers grounded in the source material with citations back to the original timestamps
- Swaps between a local LM Studio model and OpenAI with a single setting — same code path either way
HOW_IT_WORKS.mdwalks through the architecture and the design choices behind the no-vector-DB retrieval approachWEB_INTERFACE.mddocuments the chat UIRAG_PLAN_OPTIMIZED_NO_VECTORS.mdandRAG_PLAN_UPDATED.mdare the design docs that drove the build
Local-only — runs against LM Studio on localhost:1234 by default. Demo available on request.