Skip to content

JordanmFrancis/local-rag

Repository files navigation

local-rag

A local-first RAG system for talking to long-form content. Point it at YouTube playlists, podcasts, or blogs, and it ingests, chunks, indexes, and lets you have real conversations about the material — all running against a local LLM by default.

Why I built this

I kept saving hours of video and podcast content I never had time to revisit. Existing tools either shipped my data to a cloud provider or locked me into one ecosystem, so I built something that runs entirely on my own machine and lets me ask "what did this guy say about magnesium?" instead of scrubbing through a 3-hour episode.

Stack

  • Python 3 · FastAPI · WebSockets
  • LM Studio (default) for local inference, with OpenAI as an optional drop-in
  • Custom chunking + retrieval pipeline (no heavy vector DB dependency)
  • pdfplumber / PyPDF2 for document ingest
  • youtube-transcript-api for video sources

How it works

  • Pulls transcripts from a YouTube channel, playlist, or arbitrary podcast/blog source
  • Chunks content with overlap, runs an extraction pass to pull out the substantive claims, and builds a lightweight retrievable index
  • Serves a chat UI over WebSockets so you can ask questions and get answers grounded in the source material with citations back to the original timestamps
  • Swaps between a local LM Studio model and OpenAI with a single setting — same code path either way

Notes

  • HOW_IT_WORKS.md walks through the architecture and the design choices behind the no-vector-DB retrieval approach
  • WEB_INTERFACE.md documents the chat UI
  • RAG_PLAN_OPTIMIZED_NO_VECTORS.md and RAG_PLAN_UPDATED.md are the design docs that drove the build

Demo

Local-only — runs against LM Studio on localhost:1234 by default. Demo available on request.

About

Local-first RAG tool that ingests YouTube playlists, podcasts, and blogs, then lets you chat with the content using a local LLM. Like NotebookLM but fully local.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors