A comprehensive Python implementation of a Retrieval-Augmented Generation (RAG) pipeline that combines document retrieval with Large Language Models to provide accurate, context-aware answers. This project demonstrates how to build an intelligent question-answering system that can reference your own documents.
- Overview
- Project Architecture
- Project Structure
- Installation & Setup
- Configuration
- Usage Guide
- Detailed Module Documentation
- Project Flow
- Dependencies
- Features
- Advanced Configuration
- Troubleshooting
RAG (Retrieval-Augmented Generation) is a powerful AI technique that combines two key components:
- Retrieval: Searching a knowledge base (vector database) to find relevant document chunks
- Generation: Using an LLM to generate answers based on retrieved context
Traditional language models have limitations:
- Knowledge cutoff: Can only know information from training data
- Hallucination: May generate plausible-sounding but incorrect information
- No context: Can't refer to your specific documents or data
RAG solves these problems by:
- ✅ Grounding answers in actual documents
- ✅ Reducing hallucinations through factual retrieval
- ✅ Enabling Q&A on custom documents without fine-tuning
- ✅ Keeping information current and verifiable
USER QUERY
↓
[INPUT PROCESSING]
↓
[EMBEDDING GENERATION]
↓
[SIMILARITY SEARCH in Vector Store]
↓
[RETRIEVE TOP K DOCUMENTS]
↓
[CONTEXT + QUERY to LLM]
↓
[LLM GENERATES ANSWER]
↓
RESPONSE TO USER
- Load PDF documents from disk
- Split documents into manageable chunks
- Generate vector embeddings for each chunk
- Store embeddings in FAISS vector database
- Receive user query
- Generate embedding for query
- Search vector store for similar documents
- Combine retrieved context with query
- Generate answer using LLM
RAG_Pipeline/
├── README.md # This file - comprehensive documentation
├── pyproject.toml # Python project metadata and dependencies
├── main.py # Root entry point with CLI interface
├── data/ # Folder for input PDF documents
│ └── [Place your PDF files here] # PDFs to be ingested
├── logs/ # Application logs (auto-generated)
│ └── rag_pipeline.log # Rolling log file with DEBUG level detail
├── vector_store/ # FAISS vector database (auto-generated)
│ ├── index.faiss # FAISS index with embeddings
│ ├── docstore.pkl # Document metadata storage
│ └── index.pkl # Index metadata
├── src/ # Main source code package
│ ├── __init__.py # Package initialization and public API
│ ├── config.py # Configuration class with LLM/data settings
│ ├── rag_pipeline.py # RAGPipeline orchestrator class
│ ├── ingestion.py # Ingestion class (load PDFs → create embeddings)
│ ├── retrieval.py # Retrieval class (search vectors → generate answers)
│ ├── embeddings_utils.py # Shared embedding utilities
│ ├── logging_config.py # Centralized logging setup
│ └── main.py # Alternate entry point with logging
└── venv/ # Python virtual environment (created during setup)
| Directory | Purpose |
|---|---|
data/ |
Where you place PDF files to be ingested into the RAG system |
logs/ |
Application logs with rolling file handler (auto-created) |
vector_store/ |
Persisted FAISS database with embeddings (created after ingestion) |
src/ |
Core source code with class-based RAG pipeline implementation |
- Python 3.9 or higher
- pip (Python package manager)
- LM-Studio (for local model) OR OpenAI keys (for cloud models)
cd RAG_Demo_Python# On Windows (PowerShell)
python -m venv venv
.\venv\Scripts\Activate.ps1
# On macOS/Linux
python3 -m venv venv
source venv/bin/activate# Using pyproject.toml (recommended)
pip install -e .- Download LM-Studio: https://lmstudio.ai/
- Download a model (e.g., Nomic Embed Text for embeddings)
- Start LM-Studio server: Click "Start Server" (default: http://127.0.0.1:1234)
- Verify connection: Check that
config.pyhas correctLLM_BASE_URL
- Get API key from https://platform.openai.com/api-keys
- Create
.envfile in project root:OPENAI_API_KEY=sk-... OPENAI_API_URL=https://api.openai.com/v1
- Place PDF files in the
data/folder - Supported formats: PDF (PDFs with text - not scanned images)
All configuration is managed through the src/config.py module using a Config class. Configuration values are accessed as read-only properties:
from src.config import Config
config = Config()
# LLM Configuration
print(config.llm_base_url) # "http://127.0.0.1:1234/v1"
print(config.llm_model) # "text-embedding-nomic-embed-text-v1.5"
print(config.llm_temperature) # 0.7
# Data Configuration
print(config.chunk_size) # 1000
print(config.chunk_overlap) # 200
# Vector Store Configuration
print(config.vector_store_path) # Path to vector_store directory
# Logging Configuration
print(config.console_logging_enabled) # False (enable console output toggle)To modify configuration values, edit the src/config.py file and adjust the __init__ method:
class Config:
def __init__(self) -> None:
# ==================== LLM Configuration ====================
self._llm_base_url = "http://127.0.0.1:1234/v1"
self._llm_api_key = "not needed"
self._llm_model = "text-embedding-nomic-embed-text-v1.5"
self._llm_temperature = 0.7
# ==================== Data Configuration ====================
self._chunk_size = 1000 # Modify here
self._chunk_overlap = 200 # Or here
# ==================== Logging Configuration ====================
self._console_logging_enabled = False # Set to True for console output| Property Name | Config Type | Default Value | Recommended Range | Impact | Notes |
|---|---|---|---|---|---|
chunk_size |
Data Configuration | 1000 | 500-1500 | Context per retrieval | Smaller = focused (technical docs), Larger = more context (narrative) |
chunk_overlap |
Data Configuration | 200 | 100-300 | Context continuity | Higher overlap = better flow but more storage |
llm_temperature |
LLM Configuration | 0.7 | 0.1-1.0 | Answer creativity | 0.1 = factual, 0.7 = balanced, 1.0 = creative |
k (retrieval count) |
Retrieval Setting | 3 | 3-5 | Number of retrieved docs | More docs = broader context but slower queries |
# Activate virtual environment
.\venv\Scripts\Activate.ps1
# Run the main application
python main.pyThis will:
- Load and embed all PDFs from
data/folder - Create/update the FAISS vector store
- Enter interactive query mode
- Log all operations to
logs/rag_pipeline.log
=== RAG Pipeline Demo ===
=== Query Mode ===
Ask a question (or 'exit' to quit): What is machine learning?
Searching...
Answer: Machine learning is a subset of artificial intelligence that enables
systems to learn and improve from experience without being explicitly programmed...
Ask a question (or 'exit' to quit): exit
Goodbye!
from src.rag_pipeline import RAGPipeline
# Create pipeline instance
pipeline = RAGPipeline()
# Run ingestion phase
pipeline.ingest()
# Query the system
answer = pipeline.query("What is cloud computing?")
print(answer)Purpose: Centralizes all configuration parameters using a class-based approach
Key Components:
Configclass: Container for all configuration values with read-only properties- LLM settings (base URL, API key, model name, temperature)
- Data processing parameters (chunk size, overlap)
- File paths (data folder, vector store location, logs directory)
- Console logging toggle
Configuration Properties:
llm_base_url # Local LM-Studio endpoint (default: http://127.0.0.1:1234/v1)
llm_api_key # API key for LLM service
llm_model # Embedding model name
llm_temperature # Temperature for generation (0.0-1.0)
data_folder # Path to input PDFs
chunk_size # Max characters per chunk (default: 1000)
chunk_overlap # Overlap between chunks (default: 200)
vector_store_path # Path to FAISS database
console_logging_enabled # Toggle console logging outputUsage:
from src.config import Config
config = Config()
print(config.chunk_size) # 1000
print(config.llm_model) # text-embedding-nomic-embed-text-v1.5Purpose: Loads documents and creates embeddings through a class-based pipeline
Ingestion Class Methods:
- Initializes the ingestion pipeline with configuration
- Sets up internal state for embeddings and vector operations
- Scans data folder for PDF files
- Uses
PyPDFLoaderto extract text - Returns list of Document objects
- Logging: Logs each PDF processed and total pages extracted
- Error Handling: Continues with next file if one fails
- Splits documents into smaller chunks using
RecursiveCharacterTextSplitter - Respects semantic boundaries (paragraphs, sentences, words)
- Maintains overlap between chunks for context continuity
- Returns chunked documents ready for embedding
- Logging: Logs chunk count and statistics
- Generates embeddings using LLM (via embeddings_utils)
- Creates FAISS index for similarity search
- Persists index to disk
- Returns initialized FAISS vector store
- Error Handling: Handles initialization and serialization errors
- Main orchestration method
- Calls load → chunk → embed in sequence
- Logs all phases and handles errors gracefully
Example:
from src.config import Config
from src.ingestion import Ingestion
config = Config()
ingestion = Ingestion(config)
ingestion.run()
# Output: Vector store saved to vector_store/Purpose: Retrieves relevant documents and generates answers through a class-based pipeline
Retrieval Class Methods:
- Initializes the retrieval pipeline with configuration
- Sets up internal state for embeddings, LLM, and QA chain
- Loads persisted FAISS database from disk
- Initializes embeddings for query encoding
- Must use same embedding model as ingestion
- Error Handling: Raises FileNotFoundError if vector store not found
- Logging: Logs each step of loading process
- Initializes ChatOpenAI LLM for answer generation
- Uses configuration for base URL, model, and temperature
- Error Handling: Handles connection and configuration errors
- Caching: Skips re-initialization if LLM already loaded
- Creates LangChain RetrievalQA chain
- Configures retriever (k=3 documents by default)
- Sets up prompt template for context-aware generation
- Chain type: "stuff" (concatenates all retrieved docs)
- Logging: Logs chain initialization
- Main query function
- Embeds question and searches vector store
- Retrieves top-k similar document chunks
- Generates answer using LLM with context
- Input: User question
- Output: Generated answer
- Logging: Logs search results and generation steps
Flow:
User Question
↓
embed(question) using same embeddings as ingestion
↓
FAISS.similarity_search(query_embedding, k=3)
↓
retrieved_documents (top 3 most similar chunks)
↓
prompt = question + context from retrieved_documents
↓
LLM.generate(prompt)
↓
answer
Example:
from src.config import Config
from src.retrieval import Retrieval
config = Config()
retrieval = Retrieval(config)
retrieval.load_vector_store()
retrieval.initialize_llm()
retrieval.create_qa_chain()
answer = retrieval.query("What is the main topic?")
print(answer)Purpose: Main orchestrator class that coordinates the entire RAG workflow
RAGPipeline Class:
- Initializes Config, Ingestion, and Retrieval components
- Sets up the complete pipeline
- Orchestrates the data ingestion phase
- Loads PDFs, chunks them, and creates vector store
- Delegates to Ingestion class
- Main entry point for querying the system
- Delegates to Retrieval class
- Returns generated answer
Usage:
from src.rag_pipeline import RAGPipeline
pipeline = RAGPipeline()
pipeline.ingest() # Load and embed documents
answer = pipeline.query("Question here?")
print(answer)Purpose: Centralized logging setup for consistent logging across all modules
Key Features:
- Dual handler setup: Console and File
- Console output controlled by
Config.console_logging_enabled - File logging always enabled with
RotatingFileHandler - Log rotation: 10MB per file with 5 backups
- Logs directory:
logs/rag_pipeline.log - Consistent format:
timestamp - logger_name - level - function_name:line_number - message
Function:
- Configures and returns a logger with the specified name
- Creates logs directory if it doesn't exist
- Sets up both console and file handlers
- Handles circular imports by importing Config inside function
Usage:
from src.logging_config import setup_logging
logger = setup_logging(__name__)
logger.info("Application started")
logger.debug("Detailed debugging information")
logger.error("An error occurred", exc_info=True)Purpose: Shared utilities for embedding initialization, used by both ingestion and retrieval
Key Function:
- Initializes OpenAI-compatible embedding model
- Uses settings from Config class
- Returns initialized OpenAIEmbeddings instance
- Disables embedding context length check for flexibility
Usage:
from src.config import Config
from src.embeddings_utils import initialize_embeddings
config = Config()
embeddings = initialize_embeddings(config)
# embeddings ready for use in vector store operationsPurpose: Provides CLI interface for the RAG system with comprehensive logging
Flow:
- Display welcome message
- Initialize RAGPipeline
- Call ingestion pipeline to load and embed documents
- Enter interactive loop for user queries
- Handle errors with user-friendly messages
- Log all activities to file and optionally console
Usage:
python main.pyFeatures:
- Logging of all pipeline phases
- Query counter for session statistics
- Proper error handling and reporting
- Graceful exit on Ctrl+C or 'exit' command
┌─────────────────────────────────────────────────────────────┐
│ PHASE 1: DATA INGESTION (Offline) │
└─────────────────────────────────────────────────────────────┘
1. User runs: python main.py
│
├─ Setup logging (console + file with rotation)
│
└─ Create RAGPipeline() instance
├─ Initialize Config (load settings from environment)
├─ Initialize Ingestion(config)
└─ Initialize Retrieval(config)
│
└─ Call pipeline.ingest()
│
├─ Ingestion.run()
│ │
│ ├─ load_pdfs()
│ │ - Scan data/ for PDF files
│ │ - Use PyPDFLoader to extract text
│ │ - Log each file processed
│ │ - Return: Raw documents with metadata
│ │
│ ├─ chunk_documents()
│ │ - Split documents into chunks (1000 chars max)
│ │ - Add 200 char overlap for context continuity
│ │ - Use recursive splitting (paragraph/sentence/word)
│ │ - Return: Chunked documents
│ │
│ └─ create_vector_store()
│ - Initialize embeddings using LLM-Studio
│ - Generate vector embedding for each chunk
│ - Create FAISS index
│ - Save to vector_store/ directory
│ - Log all operations
│
└─ Status: Knowledge base prepared ✓
├─ Logs written to: logs/rag_pipeline.log
└─ Vector store persisted to: vector_store/
┌─────────────────────────────────────────────────────────────┐
│ PHASE 2: QUERY & RETRIEVAL (Online) │
└─────────────────────────────────────────────────────────────┘
1. Enter interactive query mode
│
└─ Loop: while user != 'exit'
│
├─ Get user query: "What is...?"
│
└─ Call pipeline.query(question)
│
├─ Retrieval.load_vector_store()
│ │ (only on first query)
│ ├─ Initialize embeddings
│ └─ Load FAISS from disk
│
├─ Retrieval.initialize_llm()
│ │ (only on first query)
│ ├─ Initialize ChatOpenAI
│ └─ Set temperature and model
│
├─ Retrieval.create_qa_chain()
│ │ (only on first query)
│ ├─ Configure retriever (k=3 documents)
│ └─ Setup context + query template
│
├─ Retrieval.query(question)
│ │
│ ├─ Embed the user question
│ ├─ Search vector store for similar chunks
│ ├─ Retrieve top 3 most relevant documents
│ │
│ ├─ Create context prompt:
│ │ "Question: {question}\n\nContext: {retrieved_docs}"
│ │
│ ├─ Send prompt to LLM
│ └─ LLM generates answer based on context
│
├─ Return: Generated answer
│
├─ Display answer to user
│
├─ Log query and response
│
└─ Loop back for next question
┌─────────────┐
│ PDF Files │
│ in data/ │
└──────┬──────┘
│
↓
┌─────────────────────┐
│ PyPDFLoader │
│ (Extract text) │
└────────┬────────────┘
│
↓ Raw text chunks
┌─────────────────────────┐
│ RecursiveTextSplitter │
│ (1000 chars + 200 over) │
└────────┬────────────────┘
│
↓ Text chunks
┌─────────────────────────┐
│ OpenAI Embeddings │
│ (Via LM-Studio) │
└────────┬────────────────┘
│
↓ Vector embeddings
┌─────────────────────────┐
│ FAISS Vector Store │
│ (Similarity search) │
└────────┬────────────────┘
│
↓ Top k similar docs
┌─────────────────────────┐
│ LLM (GPT/Claude/etc) │
│ (Generate answer) │
└────────┬────────────────┘
│
↓ Generated Answer
┌─────────────────────────┐
│ Response to User │
└─────────────────────────┘
| Package | Version | Required | Category | Purpose | Installation |
|---|---|---|---|---|---|
langchain |
≥0.1.0 | ✅ Yes | Framework | LLM orchestration and chain composition | pip install langchain |
langchain-community |
≥0.0.10 | ✅ Yes | Integrations | Community integrations (FAISS, PDFLoader, etc.) | pip install langchain-community |
langchain-openai |
≥0.3.0 | ✅ Yes | Integrations | OpenAI embeddings and LLM integration | pip install langchain-openai |
langchain-text-splitters |
≥0.3.0 | ✅ Yes | Text Processing | Semantic document chunking utilities | pip install langchain-text-splitters |
faiss-cpu |
≥1.13.2 | ✅ Yes | Vector DB | Facebook AI Similarity Search for embeddings | pip install faiss-cpu |
pypdf |
≥3.17.1 | ✅ Yes | Text Extraction | PDF text extraction and parsing | pip install pypdf |
openai |
≥1.3.0 | ✅ Yes | LLM API | OpenAI API client for models and embeddings | pip install openai |
python-dotenv |
≥1.0.0 | Configuration | Load environment variables from .env files | pip install python-dotenv |
|
numpy |
≥2.4.3 | ✅ Yes | Dependencies | Numerical computing (required by FAISS) | Auto-installed with faiss-cpu |
# From requirements.txt
pip install -r requirements.txt
# Or from pyproject.toml
pip install -e .- ✅ PDF document ingestion
- ✅ Semantic text chunking with overlap
- ✅ Vector embedding generation
- ✅ FAISS vector store for fast similarity search
- ✅ LangChain integration for LLM orchestration
- ✅ Support for multiple LLM backends (local, OpenAI)
- ✅ Interactive CLI query interface
- ✅ Context-aware answer generation
- ✅ Configurable chunk size and overlap
- ✅ Temperature control for answer randomness
- 🔲 Support for multiple document formats (DOCX, TXT, HTML)
- 🔲 Swap FAISS with pgVector is an open-source extension for PostgreSQL that allows you to store, index & search AI-generated embeddings (vectors) directly inside the database.
- 🔲 Web UI (REACT / Angular)
- 🔲 Hybrid search (BM25 + semantic)
- 🔲 Query expansion and refinement
- 🔲 Document metadata filtering
- 🔲 Answer citation tracking
- 🔲 Conversation history and context preservation
- 🔲 Fine-tuning on domain-specific data
- 🔲 Multi-language support
- 🔲 Performance metrics and logging
- 🔲 Caching for repeated queries
To modify chunk size and overlap, edit src/config.py:
class Config:
def __init__(self) -> None:
# Smaller chunks (500-800 chars) for dense technical documents
self._chunk_size = 500 # More focused retrieval
self._chunk_overlap = 100 # Less context per chunk
# OR Larger chunks (1200-2000 chars) for narrative documents
self._chunk_size = 1500 # More context retained
self._chunk_overlap = 300 # Better continuityTrade-offs:
- Smaller chunks (500-800): ✅ More focused retrieval, ✅ Faster, ❌ Less context
- Larger chunks (1200-2000): ✅ More context, ✅ Fewer chunks, ❌ Less precision
Modify the llm_temperature property in src/config.py:
class Config:
def __init__(self) -> None:
self._llm_temperature = 0.1 # Very deterministic (factual Q&A)
# OR
self._llm_temperature = 0.7 # Balanced (default)
# OR
self._llm_temperature = 1.0 # Very creative (brainstorming)To retrieve more documents for context, modify src/retrieval.py in the create_qa_chain() method:
def create_qa_chain(self) -> None:
# ...
retriever = self._vector_store.as_retriever(
search_kwargs={"k": 5} # Change from 3 to 5 documents
)
# ...To see logs in the console while running, edit src/config.py:
class Config:
def __init__(self) -> None:
self._console_logging_enabled = True # Change from FalseSolution:
- Ensure PDF files are in
data/folder at project root - Check file extensions are
.pdf(lowercase) - Verify PDFs contain extractable text (not scanned images)
Solution:
- Start LM-Studio application
- Click "Start Server" button
- Verify it shows "http://127.0.0.1:1234/v1"
- Check
config.pyhas correct URL
Solution:
- Create
.envfile in project root - Add:
OPENAI_API_KEY=sk-your-key-here - Or set environment variable directly:
$env:OPENAI_API_KEY = "sk-..."
Solution:
# Verify virtual environment is activated
.\venv\Scripts\Activate.ps1
# Reinstall dependencies
pip install -e .Solution:
- Check if LM-Studio is loaded with model
- Reduce
chunk_sizeinsrc/config.py(e.g., from 1000 to 500) - Process PDFs in batches
Optimization tips:
- Increase
k(number of retrieved documents) from 3 to 5 inretrieval.py - Reduce
chunk_sizefor more targeted retrieval inconfig.py - Lower
llm_temperaturefor factual answers (0.3-0.5) inconfig.py - Ensure PDFs have good quality text
- Enable console logging with
config.console_logging_enabled = Trueto review details
- "What are the main components?"
- "How does this process work?"
- "What are the key steps?"
- "Summarize the document"
- "What are the key points?"
- "Give me an overview"
- "What is mentioned about XYZ?"
- "When did this happen?"
- "Who is responsible for this?"
- "What's the difference between X and Y?"
- "Compare the two approaches"
- "Which is better for Z use case?"
- Check Troubleshooting section
- Review inline code comments
- Check LLM documentation:
- LM-Studio: https://lmstudio.ai/
- LangChain Documentation: https://python.langchain.com/
- FAISS Documentation: https://github.com/facebookresearch/faiss
- Vector Database Guide: https://www.pinecone.io/learn/vector-database/
This project is open-source and available under the MIT License. See LICENSE file for details.
- Vector databases: FAISS, Pinecone, Weaviate, Milvus
- Text embeddings: Sentence Transformers, OpenAI Embeddings, Cohere
- LLMs: GPT-3/4, Claude, LLaMA, Mistral
# 1. Setup
python -m venv venv
.\venv\Scripts\Activate.ps1
pip install -e .
# 2. Prepare data
# - Place PDFs in data/ folder
# - Start LM-Studio and load a model (or configure OpenAI API)
# 3. Run
python main.py
# 4. Query
Ask a question (or 'exit' to quit): What is the main topic?Last Updated: March 2026
For the most current information, check inline code comments in each module.
Current Version: 0.2.0 (March 2026)
- ✅ Refactored to class-based architecture (Config, Ingestion, Retrieval, RAGPipeline)
- ✅ Added comprehensive logging with file rotation
- ✅ Enhanced error handling across all modules
- ✅ Improved import structure with relative imports
- ✅ Added logging_config.py for centralized logging setup
- ✅ Made configuration values read-only properties
Previous Version: 0.1.0
- Basic RAG pipeline with functional approach
- Simple in-memory logging