Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

README.md

Module 03 · RAG Systems

Build Retrieval-Augmented Generation pipelines that ground AI responses in your enterprise documents — eliminating hallucinations and keeping answers current without retraining.

Files

File Description
full_rag_pipeline.py End-to-end pipeline: ingest → embed → retrieve → generate

Architecture

Documents → Chunker → Embeddings (HuggingFace) → ChromaDB
                                                        ↓
User Query → Embeddings → Retrieval → Context Builder → LLM (Grok/OpenRouter) → Answer

Quick Start

# Run the full demo (creates local chroma_db/)
python 03-rag-systems/full_rag_pipeline.py

Key Parameters (via .env)

Variable Default Description
EMBED_MODEL all-MiniLM-L6-v2 Embedding model
CHROMA_PERSIST_DIR ./chroma_db Vector store location
RAG_CHUNK_SIZE 500 Words per chunk
RAG_CHUNK_OVERLAP 50 Overlap between chunks
RAG_TOP_K 4 Chunks retrieved per query
OPENROUTER_RAG_MODEL mistralai/mistral-small LLM for generation

Scaling to Production

  • Swap ChromaDB → Pinecone / pgvector for multi-tenant scale
  • Add reranking with cross-encoder/ms-marco-MiniLM-L-6-v2
  • Implement hybrid search (dense + BM25) for better recall
  • Add metadata filters to scope retrieval by department/date