A production-ready, 100% Local and Private Retrieval-Augmented Generation (RAG) system for Document Q&A. Ask questions about your PDFs and get answers grounded in your data with full source citations.
- Privacy First: Everything runs locally. No cloud APIs, no data leaks.
- Local LLM Integration: Powered by Ollama (supports Llama 3, Mistral, etc.).
- Smart Text Processing: Recursive character splitting with overlap for high-quality context.
- Source Citations: Every answer includes a breakdown of which document and page were used.
- Modern UI: Dark-themed, glassmorphic web interface with professional Markdown rendering (tables, lists, etc.).
- Stateless Node Architecture: Clean, modular Python backend using Pydantic for state management.
- Python 3.10+
- Ollama: Download and install Ollama
- After installing, pull a model:
ollama pull llama3
- After installing, pull a model:
-
Clone the repository (or navigate to the project folder):
cd projects/RAG -
Install Python dependencies:
pip install -r requirements.txt
-
Start the Flask Server:
python app.py
-
Access the Web UI: Open http://127.0.0.1:5000 in your browser.
| Component | Technology |
|---|---|
| LLM Engine | Ollama (Local API) |
| Vector Store | ChromaDB (Persistent) |
| Embeddings | SentenceTransformers (all-MiniLM-L6-v2) |
| PDF Parser | PyMuPDF (fitz) |
| Backend | Flask |
| Frontend | Vanilla JS + Marked.js + CSS Glassmorphism |
- Managing Documents: Use the sidebar to upload multiple PDFs or clear the entire database.
- Model Detection: The system automatically detects which models you have installed in Ollama. If
llama3isn't found, it will fallback to your first available local model. - Clearing History: Use the "Clear Chat" button to reset the windowed memory context for new topics.