ChatWithPDFs

A comprehensive RAG (Retrieval-Augmented Generation) application suite for interacting with PDF documents using Large Language Models. This repository contains two implementations: a text-only PDF chat application and an advanced multimodal PDF chat that can understand text, images, and tables.

✨ Features

TextPDFRag

Text-only PDF processing: Extract and process text content from PDFs
RAG pipeline: Retrieve relevant context and generate answers
Streamlit interface: User-friendly web interface for chatting with PDFs
Vector search: Semantic search using ChromaDB and embeddings

MultiModalPDFChat

Multimodal processing: Handles text, images, and tables from PDFs
Vision model support: Uses vision LLMs (llava, bakllava, moondream) to understand images
Table extraction: Automatically extracts and processes tables using pdfplumber
Image analysis: Visual understanding of charts, diagrams, and figures
Conversation history: Maintains chat history with visual context
Comprehensive RAG: Retrieves and uses text, images, and tables for answering questions

📁 Project Structure

ChatWithPDFs/
├── TextPDFRag/
│   ├── app.py                 # Text-only PDF chat application
│   └── chroma_db/             # Vector database storage
├── MultiModalPDFChat/
│   ├── app.py                 # Multimodal PDF chat application
│   ├── generate_multi_modal_pdf.py  # Utility to generate test PDFs
│   ├── chroma_db/             # Vector database storage
│   └── pdfs/                  # Sample PDF files
└── README.md

🔧 Prerequisites

Python 3.8+

Ollama installed and running

Download from ollama.ai

Install required models:

# For TextPDFRag
ollama pull llama3
ollama pull nomic-embed-text

# For MultiModalPDFChat (vision model)
ollama pull llava
ollama pull nomic-embed-text

📦 Installation

Clone the repository (or navigate to the project directory)

Install Python dependencies:

pip install streamlit pypdf langchain-ollama langchain-community langchain-core chromadb pymupdf pillow pdfplumber pandas

Or create a requirements.txt and install:

pip install -r requirements.txt

🚀 Usage

TextPDFRag - Text-Only PDF Chat

Start the application:
```
cd TextPDFRag
streamlit run app.py
```
Use the application:
- Upload a PDF file through the web interface
- Wait for the PDF to be processed (text extraction and vectorization)
- Ask questions about the PDF content
- Type 'quit' or 'exit' to end the conversation

MultiModalPDFChat - Multimodal PDF Chat

Start the application:

cd MultiModalPDFChat
streamlit run app.py

Use the application:
- Upload a PDF file (can contain text, images, and tables)
- Wait for processing (extracts text, images, and tables)
- Ask questions about the PDF
- The system will retrieve relevant text, images, and tables
- View the conversation history with images and tables displayed
- Type 'quit' or 'exit' to end the conversation
Generate test PDFs (optional):
```
cd MultiModalPDFChat
python generate_multi_modal_pdf.py
```
This creates a sample PDF with text, images, and tables for testing.

🛠️ Technologies Used

Streamlit: Web application framework
LangChain: LLM application framework
- langchain-ollama: Ollama integration
- langchain-community: Community integrations (ChromaDB)
- langchain-core: Core LangChain functionality
Ollama: Local LLM inference server
- Models: llama3, llava, nomic-embed-text
ChromaDB: Vector database for embeddings storage
PyPDF: PDF text extraction
PyMuPDF (fitz): PDF image extraction
pdfplumber: PDF table extraction
Pillow (PIL): Image processing
Pandas: Table data manipulation

📱 Applications

TextPDFRag

A simple RAG implementation that:

Extracts text from PDFs
Splits text into chunks
Creates embeddings and stores in ChromaDB
Retrieves relevant chunks based on queries
Generates answers using the retrieved context

Best for: Text-heavy PDFs like research papers, articles, documentation

MultiModalPDFChat

An advanced RAG implementation that:

Extracts text, images, and tables from PDFs
Creates separate document chunks for each modality
Uses vision models to understand images
Retrieves relevant text, images, and tables
Generates comprehensive answers using all modalities

Best for: PDFs with charts, diagrams, tables, technical reports, presentations

🔍 How It Works

RAG Pipeline

Document Processing:
- Extract content (text/images/tables) from PDF
- Split into manageable chunks
- Create embeddings for each chunk
Storage:
- Store embeddings in ChromaDB vector database
- Maintain metadata (page numbers, content type, etc.)
Retrieval:
- Convert user query to embedding
- Find similar chunks using vector similarity search
- Retrieve top-k most relevant chunks
Generation:
- Augment LLM prompt with retrieved context
- Generate answer using the context
- For multimodal: Include images in vision model input

📝 Notes

The vector databases are stored locally in chroma_db/ directories
Each application maintains its own separate vector database
Vision models require more computational resources than text-only models
Processing time depends on PDF size and complexity

🤝 Contributing

Feel free to submit issues, fork the repository, and create pull requests for any improvements.

📄 License

This project is for educational purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
MultiModalPDFChat		MultiModalPDFChat
TextPDFRag		TextPDFRag
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChatWithPDFs

📋 Table of Contents

✨ Features

TextPDFRag

MultiModalPDFChat

📁 Project Structure

🔧 Prerequisites

📦 Installation

🚀 Usage

TextPDFRag - Text-Only PDF Chat

MultiModalPDFChat - Multimodal PDF Chat

🛠️ Technologies Used

📱 Applications

TextPDFRag

MultiModalPDFChat

🔍 How It Works

RAG Pipeline

📝 Notes

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ChatWithPDFs

📋 Table of Contents

✨ Features

TextPDFRag

MultiModalPDFChat

📁 Project Structure

🔧 Prerequisites

📦 Installation

🚀 Usage

TextPDFRag - Text-Only PDF Chat

MultiModalPDFChat - Multimodal PDF Chat

🛠️ Technologies Used

📱 Applications

TextPDFRag

MultiModalPDFChat

🔍 How It Works

RAG Pipeline

📝 Notes

🤝 Contributing

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages