A Python application that uses Groq LLM, LangChain, and Cache-Augmented Generation (CAG) to process PDFs and answer questions about their content.
- PDF Processing: Extract and process text from PDF documents
- Vector Embeddings: Create semantic embeddings for efficient document retrieval
- Cache-Augmented Generation: Intelligent caching system for improved performance
- Groq Integration: Fast LLM inference using Groq's API
- Interactive Q&A: Ask natural language questions about your PDF content
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables: Create a
.envfile with your Groq API key:GROQ_API_KEY=your_groq_api_key_here -
Place your PDF file in the project directory and name it
mypdf.pdf
Run the main application:
python main.pyThen ask questions about your PDF content interactively.
main.py- Main application entry pointpdf_processor.py- PDF text extraction and processingcag_system.py- Cache-Augmented Generation implementationvector_store.py- Vector database managementconfig.py- Configuration settingsrequirements.txt- Python dependencies
- Document Ingestion: The PDF is processed and split into chunks
- Embedding Creation: Text chunks are converted to vector embeddings
- Vector Storage: Embeddings are stored in ChromaDB for fast retrieval
- Caching Layer: CAG system caches frequently accessed information
- Question Processing: User questions are embedded and matched against the document
- Answer Generation: Relevant context is sent to Groq LLM for answer generation