Skip to content

greggvg/SimpleRAGPDF-Demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ SimpleRAGPDF-Demo

A simple yet powerful Retrieval-Augmented Generation (RAG) system for PDF documents.

Extract text from PDFs β†’ chunk it intelligently β†’ generate embeddings β†’ answer questions using a local LLM. All without LangChain!


⭐ Key Features

  • ✨ No LangChain Required - Custom recursive text splitter with same behavior
  • πŸš€ Lightweight - 81M parameter model (distilgpt2), runs on CPU
  • πŸ“„ PyMuPDF - Fast and reliable PDF text extraction
  • 🧠 Semantic Search - SentenceTransformers embeddings
  • πŸ” FAISS - Lightning-fast vector similarity (<50ms per query)
  • ⏱️ Performance Metrics - Built-in timing for all operations
  • πŸ’» Production Code - Modular, documented, extensible
  • πŸ“Š Jupyter & CLI - Multiple ways to use

🎯 Quick Start (30 seconds)

# 1. Install dependencies
pip install -r requirements.txt

# 2. Run interactive demo
python main.py

# 3. Provide PDF path and ask questions!

For Jupyter:

jupyter notebook SimpleRAGPDF_Notebook.ipynb

πŸ“š Documentation

Document Purpose
DEMO_GUIDE.md 🎀 How to present this project
EXAMPLES.md πŸ’‘ Code examples & usage patterns
ARCHITECTURE.md πŸ—οΈ Deep dive into design
ORGANIZATION.md πŸ“¦ Code structure overview

πŸ‘€ See It In Action

First, understand what it does:

python demo.py

Shows system overview with ASCII diagrams and performance expectations.

Then, try it interactively:

python main.py

Output Example:

Extracting text from document.pdf...
Extracted 45823 characters
Splitting text into chunks...
Created 52 chunks
Building embedding index...
Index built in 3.42s

> Ask a question: What is this about?

Searching and generating answer...
Tokenize: 0.156s | Generate: 2.341s | Input tokens: 287

Answer:
This document discusses advanced machine learning techniques...

πŸ“– How It Works

1. PDF EXTRACTION (PyMuPDF)
   Input: PDF file β†’ Output: Raw text

2. TEXT CHUNKING (Custom, intelligent splitting)
   900 chars/chunk with 150 char overlap

3. EMBEDDINGS (SentenceTransformers)
   Semantic understanding of chunks

4. FAISS INDEX (Fast vector search)
   <50ms per query

5. RETRIEVAL (Top-K similarity search)
   Find 3 most relevant chunks by default

6. GENERATION (distilgpt2 LLM)
   Generate answer using context + question

See detailed architecture: ARCHITECTURE.md


πŸ—οΈ Project Structure

SimpleRAGPDF-Demo/
β”œβ”€β”€ πŸš€ main.py                          Interactive CLI entry point
β”œβ”€β”€ 🎬 demo.py                          System overview & guide
β”œβ”€β”€ πŸ““ SimpleRAGPDF_Notebook.ipynb       Jupyter notebook (Colab compatible)
β”œβ”€β”€ πŸ“‹ requirements.txt                  Dependencies
β”œβ”€β”€ πŸ“– README.md                         This file
β”œβ”€β”€ πŸ“š DEMO_GUIDE.md                     How to present this
β”œβ”€β”€ πŸ’‘ EXAMPLES.md                       Code examples
β”œβ”€β”€ πŸ—οΈ  ARCHITECTURE.md                  Deep technical dive
β”œβ”€β”€ πŸ“¦ ORGANIZATION.md                   Code structure
└── src/
    β”œβ”€β”€ config.py                     βš™οΈ  Configuration
    β”œβ”€β”€ pdf_processor.py              πŸ“„ PDF extraction & chunking
    β”œβ”€β”€ embeddings.py                 🧠 Embeddings & FAISS
    β”œβ”€β”€ llm.py                        πŸ€– Model inference
    └── rag.py                        πŸ”— Main orchestrator

πŸ’» Code Example

Basic Usage

from src.rag import SimpleRAG

# Initialize
rag = SimpleRAG()

# Load PDF
rag.load_pdf("document.pdf")

# Ask questions
result = rag.ask("What is the main topic?")
print(result['answer'])

Advanced Usage

from src.rag import SimpleRAG

rag = SimpleRAG(device="cuda")  # Use GPU
rag.load_pdf("document.pdf", verbose=True)

result = rag.ask(
    question="Summarize key findings",
    top_k=5,              # Retrieve 5 chunks
    max_new_tokens=200,   # Longer answer
    show_context=True     # Show retrieved text
)

print(f"Answer: {result['answer']}")
print(f"Timing: {result['timing']}")
print(f"Sources: {result['context']}")

More examples: EXAMPLES.md


🎯 For Demo/Presentation

Perfect for showing RAG principles in action!

Recommended flow:

  1. Run python demo.py (2 min) - Overview
  2. Show code structure (2 min)
  3. Run python main.py (15 min) - Live demo
  4. Q&A (5 min)

Total: ~20 minutes

See DEMO_GUIDE.md for detailed presentation tips, talking points, and Q&A prep.


πŸ“Š Performance

Metric CPU GPU
PDF extraction (10-page) ~0.5s ~0.5s
Text chunking (50 chunks) ~0.1s ~0.1s
Embeddings (50 chunks) ~2-5s ~0.2-0.5s
Query retrieval <50ms <50ms
Text generation (120 tokens) 1-3s 0.2-0.5s
Total per query ~1.5-3s ~0.4-0.7s

βš™οΈ Configuration

Edit src/config.py to customize:

# Text splitting
CHUNK_SIZE = 900              # Characters per chunk
CHUNK_OVERLAP = 150           # Overlap between chunks
MIN_CHUNK_CHARS = 200         # Minimum chunk size

# Models
EMBEDDING_MODEL = "all-MiniLM-L6-v2"   # Fast & semantic
LLM_MODEL = "distilgpt2"                # Lightweight generation

# Generation
MAX_INPUT_TOKENS = 512        # Input prompt size
MAX_NEW_TOKENS = 120          # Max output length
TOP_K = 3                     # Chunks to retrieve

πŸ“¦ Dependencies

  • pymupdf - PDF text extraction
  • sentence-transformers - Semantic embeddings
  • faiss-cpu - Vector similarity search
  • transformers - HuggingFace models
  • torch - Deep learning framework
  • numpy - Numerical operations

All specified in requirements.txt


πŸ’‘ Tips & Tricks

  • For GPU acceleration: faiss-gpu instead of faiss-cpu
  • For better quality: Try different embedding models (trade-off with speed)
  • For larger answers: Increase max_new_tokens
  • For more context: Increase top_k retrieval
  • For faster results: Reduce chunk_size and top_k

⚠️ Limitations

  • Works with English documents (transformer models are English-focused)
  • Must fit document in system memory
  • Requires internet for first-time model downloads
  • Answers depend on document quality and question clarity

πŸ”„ Model Alternatives

Easily swap models in src/config.py:

# Larger embedding (better quality, slower)
"all-mpnet-base-v2"           # 109M params

# Faster embedding
"all-MiniLM-L6-v2"            # 22M params (default)

# Larger LLM (better answers, slower, needs GPU)
"gpt2"                        # 124M params
"distilgpt2"                  # 82M params (default)

πŸ›£οΈ Future Enhancements

  • Multi-format support (Word, PowerPoint, HTML)
  • Multi-language support
  • Persistent vector database integration
  • Web interface
  • Streaming response generation
  • Document Q&A with citations
  • Batch processing

🀝 How to Extend

Swap components:

# Use different embedding model
from src.embeddings import EmbeddingIndex
index = EmbeddingIndex("all-mpnet-base-v2")

# Use different LLM
from src.llm import LLMGenerator
llm = LLMGenerator("gpt2", device="cuda")

Modify pipeline:

# More context chunks
result = rag.ask(query, top_k=10)

# Longer answers
result = rag.ask(query, max_new_tokens=300)

See: ARCHITECTURE.md


🧠 Learning Resources

This project is great for learning:

  • βœ… RAG (Retrieval-Augmented Generation) principles
  • βœ… FAISS vector databases
  • βœ… Sentence Transformers embeddings
  • βœ… Text chunking strategies
  • βœ… LLM inference optimization
  • βœ… Production Python code structure

❓ FAQ

Q: Why not use LangChain?
A: This teaches the core concepts clearly without abstraction layers. LangChain is great for production but adds complexity for learning.

Q: Can I use my custom LLM?
A: Yes! Modify src/llm.py and src/config.py to load any HuggingFace model.

Q: How to use with GPU?
A: Pass device="cuda" to SimpleRAG: rag = SimpleRAG(device="cuda")

Q: Works with non-English PDFs?
A: Current models are English-trained. Future enhancement: multilingual models.

Q: How much memory needed?
A: ~700MB for models + ~100MB per 50 chunks of document.


πŸ“ License

MIT


🎬 Ready to Demo?

  1. pip install -r requirements.txt
  2. python demo.py (understand the system)
  3. python main.py (interactive demo)
  4. Read DEMO_GUIDE.md for presentation tips

Show this to others! It's educational, practical, and impressive. πŸš€


Questions? See EXAMPLES.md for more usage patterns or ARCHITECTURE.md for technical details.

About

Simple PDF RAG demo using PyMuPDF for text extraction, SentenceTransformers for embeddings, FAISS for vector retrieval, and Hugging Face Transformers for answering. Includes a custom recursive text splitter without LangChain, token truncation for stable prompts, timing logs for performance visibility, and lightweight default model that runs on CPU

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors