Skip to content

Ushnesha/ChatWithPDFs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ChatWithPDFs

A comprehensive RAG (Retrieval-Augmented Generation) application suite for interacting with PDF documents using Large Language Models. This repository contains two implementations: a text-only PDF chat application and an advanced multimodal PDF chat that can understand text, images, and tables.

📋 Table of Contents

✨ Features

TextPDFRag

  • Text-only PDF processing: Extract and process text content from PDFs
  • RAG pipeline: Retrieve relevant context and generate answers
  • Streamlit interface: User-friendly web interface for chatting with PDFs
  • Vector search: Semantic search using ChromaDB and embeddings

MultiModalPDFChat

  • Multimodal processing: Handles text, images, and tables from PDFs
  • Vision model support: Uses vision LLMs (llava, bakllava, moondream) to understand images
  • Table extraction: Automatically extracts and processes tables using pdfplumber
  • Image analysis: Visual understanding of charts, diagrams, and figures
  • Conversation history: Maintains chat history with visual context
  • Comprehensive RAG: Retrieves and uses text, images, and tables for answering questions

📁 Project Structure

ChatWithPDFs/
├── TextPDFRag/
│   ├── app.py                 # Text-only PDF chat application
│   └── chroma_db/             # Vector database storage
├── MultiModalPDFChat/
│   ├── app.py                 # Multimodal PDF chat application
│   ├── generate_multi_modal_pdf.py  # Utility to generate test PDFs
│   ├── chroma_db/             # Vector database storage
│   └── pdfs/                  # Sample PDF files
└── README.md

🔧 Prerequisites

  1. Python 3.8+
  2. Ollama installed and running
    • Download from ollama.ai
    • Install required models:
      # For TextPDFRag
      ollama pull llama3
      ollama pull nomic-embed-text
      
      # For MultiModalPDFChat (vision model)
      ollama pull llava
      ollama pull nomic-embed-text

📦 Installation

  1. Clone the repository (or navigate to the project directory)

  2. Install Python dependencies:

    pip install streamlit pypdf langchain-ollama langchain-community langchain-core chromadb pymupdf pillow pdfplumber pandas

    Or create a requirements.txt and install:

    pip install -r requirements.txt

🚀 Usage

TextPDFRag - Text-Only PDF Chat

  1. Start the application:

    cd TextPDFRag
    streamlit run app.py
  2. Use the application:

    • Upload a PDF file through the web interface
    • Wait for the PDF to be processed (text extraction and vectorization)
    • Ask questions about the PDF content
    • Type 'quit' or 'exit' to end the conversation

MultiModalPDFChat - Multimodal PDF Chat

  1. Start the application:

    cd MultiModalPDFChat
    streamlit run app.py
  2. Use the application:

    • Upload a PDF file (can contain text, images, and tables)
    • Wait for processing (extracts text, images, and tables)
    • Ask questions about the PDF
    • The system will retrieve relevant text, images, and tables
    • View the conversation history with images and tables displayed
    • Type 'quit' or 'exit' to end the conversation
  3. Generate test PDFs (optional):

    cd MultiModalPDFChat
    python generate_multi_modal_pdf.py

    This creates a sample PDF with text, images, and tables for testing.

🛠️ Technologies Used

  • Streamlit: Web application framework
  • LangChain: LLM application framework
    • langchain-ollama: Ollama integration
    • langchain-community: Community integrations (ChromaDB)
    • langchain-core: Core LangChain functionality
  • Ollama: Local LLM inference server
    • Models: llama3, llava, nomic-embed-text
  • ChromaDB: Vector database for embeddings storage
  • PyPDF: PDF text extraction
  • PyMuPDF (fitz): PDF image extraction
  • pdfplumber: PDF table extraction
  • Pillow (PIL): Image processing
  • Pandas: Table data manipulation

📱 Applications

TextPDFRag

A simple RAG implementation that:

  • Extracts text from PDFs
  • Splits text into chunks
  • Creates embeddings and stores in ChromaDB
  • Retrieves relevant chunks based on queries
  • Generates answers using the retrieved context

Best for: Text-heavy PDFs like research papers, articles, documentation

MultiModalPDFChat

An advanced RAG implementation that:

  • Extracts text, images, and tables from PDFs
  • Creates separate document chunks for each modality
  • Uses vision models to understand images
  • Retrieves relevant text, images, and tables
  • Generates comprehensive answers using all modalities

Best for: PDFs with charts, diagrams, tables, technical reports, presentations

🔍 How It Works

RAG Pipeline

  1. Document Processing:

    • Extract content (text/images/tables) from PDF
    • Split into manageable chunks
    • Create embeddings for each chunk
  2. Storage:

    • Store embeddings in ChromaDB vector database
    • Maintain metadata (page numbers, content type, etc.)
  3. Retrieval:

    • Convert user query to embedding
    • Find similar chunks using vector similarity search
    • Retrieve top-k most relevant chunks
  4. Generation:

    • Augment LLM prompt with retrieved context
    • Generate answer using the context
    • For multimodal: Include images in vision model input

📝 Notes

  • The vector databases are stored locally in chroma_db/ directories
  • Each application maintains its own separate vector database
  • Vision models require more computational resources than text-only models
  • Processing time depends on PDF size and complexity

🤝 Contributing

Feel free to submit issues, fork the repository, and create pull requests for any improvements.

📄 License

This project is for educational purposes.

About

A comprehensive RAG (Retrieval-Augmented Generation) application suite for interacting with PDF documents using Large Language Models. This repository contains two implementations: a text-only PDF chat application and an advanced multimodal PDF chat that can understand text, images, and tables.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages