A multimodal Retrieval-Augmented Generation (RAG) pipeline designed to analyze veterinary medical documents, with a specific focus on dog health. This script uses cutting-edge language models, document parsing tools, and vector search technologies to provide advanced question answering and summarization capabilities.
- PDF Parsing: Extracts text, tables, and images from PDF files using the
unstructuredlibrary. - Summarization: Uses Google Gemini LLMs to summarize each extracted element (text, table, and image).
- Image Analysis: Processes and encodes images with Gemini's vision model to generate content descriptions.
- Vector Store: Stores all summaries and metadata in a FAISS vector database for efficient similarity search and retrieval.
- Question Answering: Answers user queries about dog health by retrieving relevant context from the vector database and providing multimodal responses (text, tables, and images).
- Colab Integration: Designed for seamless use in Google Colab, including API key management via
Colab’s userdata.
This project relies on the following key libraries and tools:
- Python Libraries:
LangChainGoogle Generative AIFAISSunstructured
- System Dependencies:
Tesseract OCRPoppler Utils
This script is designed to be used in a Google Colab environment.
- Install Dependencies: Ensure all required Python libraries and system dependencies (
Tesseract,Poppler) are installed in your environment. - Extract Elements: The script will automatically extract text, tables, and images from a specified veterinary PDF.
- Summarize and Store: Each extracted element is summarized using Gemini models and stored in a FAISS vector store.
- Ask Questions: You can then ask questions about the document. The script retrieves the most relevant context and generates a detailed, evidence-based answer.
Given a veterinary PDF, you can ask a question like: The script will retrieve the relevant text, tables, and images and provide a detailed, model-generated answer. It can even display annotated images to support its response.
For any suggestions or issues, please open an issue in the repository.

