Skip to content

wish-team/hp-cag

Repository files navigation

PDF Question-Answering System with CAG

A Python application that uses Groq LLM, LangChain, and Cache-Augmented Generation (CAG) to process PDFs and answer questions about their content.

Features

  • PDF Processing: Extract and process text from PDF documents
  • Vector Embeddings: Create semantic embeddings for efficient document retrieval
  • Cache-Augmented Generation: Intelligent caching system for improved performance
  • Groq Integration: Fast LLM inference using Groq's API
  • Interactive Q&A: Ask natural language questions about your PDF content

Setup

  1. Install dependencies:

    pip install -r requirements.txt
  2. Set up environment variables: Create a .env file with your Groq API key:

    GROQ_API_KEY=your_groq_api_key_here
    
  3. Place your PDF file in the project directory and name it mypdf.pdf

Usage

Run the main application:

python main.py

Then ask questions about your PDF content interactively.

Project Structure

  • main.py - Main application entry point
  • pdf_processor.py - PDF text extraction and processing
  • cag_system.py - Cache-Augmented Generation implementation
  • vector_store.py - Vector database management
  • config.py - Configuration settings
  • requirements.txt - Python dependencies

How it Works

  1. Document Ingestion: The PDF is processed and split into chunks
  2. Embedding Creation: Text chunks are converted to vector embeddings
  3. Vector Storage: Embeddings are stored in ChromaDB for fast retrieval
  4. Caching Layer: CAG system caches frequently accessed information
  5. Question Processing: User questions are embedded and matched against the document
  6. Answer Generation: Relevant context is sent to Groq LLM for answer generation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages