📑 FDA RAG Document Intelligence

AI-powered FDA document intelligence system using Retrieval-Augmented Generation (RAG) to analyze FDA 483 inspection reports and provide accurate, compliance-grade answers through a Streamlit interface.

🔍 Project Overview

This project demonstrates an end-to-end RAG (Retrieval-Augmented Generation) pipeline designed to work with FDA inspection and quality documents (FDA 483 reports).

Instead of relying on generic AI knowledge, the system:

Retrieves relevant content directly from uploaded FDA PDFs
Uses a strict Quality & Compliance expert prompt
Generates fact-based, non-hallucinated answers
Supports querying across all documents or a single selected PDF

The application is built as a professional Streamlit web app, suitable for demos, learning, and portfolio use.

✨ Key Features

📄 Query across all FDA PDFs or a single selected document
🔍 Accurate document-grounded answers using RAG
🧠 Strict FDA Quality, Compliance & R&D expert behavior
🚫 No hallucination (answers only from provided PDFs)
📊 Displays total PDFs and knowledge chunks
🎨 Clean, enterprise-style Streamlit UI
⚡ Powered by OpenAI embeddings + Pinecone vector DB

🧠 What is RAG (Retrieval-Augmented Generation)?

RAG combines:

Retrieval – finding relevant document chunks from a vector database
Generation – using an LLM to answer based only on retrieved content

This ensures:

High factual accuracy
No guessing
Enterprise-ready AI behavior

🏗️ Architecture (High Level)

FDA PDFs ↓ Text Extraction + Chunking ↓ Embeddings (OpenAI) ↓ Vector Database (Pinecone) ↓ Retriever ↓ LLM (Strict Quality Expert Prompt) ↓ Streamlit UI

🧰 Tech Stack

Language: Python
Frontend: Streamlit
LLM & Embeddings: OpenAI
Vector Database: Pinecone
RAG Architecture
PDF Handling: PyMuPDF + OCR (in notebook)

📁 Project Structure

fda-rag-document-intelligence/
│
├── Streamlit_app.py          # Streamlit web application
├── requirements.txt          # Python dependencies
├── README.md                 # Project documentation
├── .gitignore                # Ignored files
│
├── notebooks/
│   └── FDA_BOT_20_01.ipynb   # Original RAG development notebook
│
├── visuals/
│   └── app_screenshots.png   # UI screenshots (optional)
│
└── .streamlit/
    └── secrets.example.toml  # Example secrets file

⚙️ Setup Instructions (Run Locally)

1️⃣ Clone the Repository

git clone https://github.com/MK1404/fda-rag-document-intelligence.git
cd fda-rag-document-intelligence

2️⃣ Install Dependencies

pip install -r requirements.txt

3️⃣ Add API Keys (Secrets)

Create a folder:

mkdir .streamlit

Create a file: .streamlit/secrets.toml

OPENAI_API_KEY = "your-openai-api-key"
PINECONE_API_KEY = "your-pinecone-api-key"

⚠️ Do NOT commit this file to GitHub

4️⃣ Run the Streamlit App

streamlit run Streamlit_app.py

🖥️ How to Use the App

By default, the app searches across all PDFs
Use the sidebar dropdown to select a specific PDF
Ask questions such as:
- “List all observations for this site”
- “Return common FDA observations across all PDFs”
- “What quality issues were identified?”
The system responds using only document content

📌 Example Questions

Return all FDA observations from the selected report
What repeated quality issues appear across inspections?
List CAPA-related observations
Identify compliance gaps mentioned in the document

🚀 Future Enhancements

PDF upload directly from UI
Source citation highlighting
Observation severity tagging
Compliance dashboards
Export responses as reports

⚠️ Important Notes

This project is for learning and demonstration purposes
No confidential or proprietary FDA data is included
Users must provide their own API keys

👤 Author

Mohit Data Analytics & AI (Learning Project)

⭐ If You Find This Useful

Consider starring ⭐ the repo to support the project.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
AI-Powered-FDA-Document-Intelligence-System.pptx		AI-Powered-FDA-Document-Intelligence-System.pptx
FDA_BOT_20_01.ipynb		FDA_BOT_20_01.ipynb
README.md		README.md
Streamlit_app.py		Streamlit_app.py
Visuals of the FDA Analyzer App.png		Visuals of the FDA Analyzer App.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📑 FDA RAG Document Intelligence

🔍 Project Overview

✨ Key Features

🧠 What is RAG (Retrieval-Augmented Generation)?

🏗️ Architecture (High Level)

🧰 Tech Stack

📁 Project Structure

⚙️ Setup Instructions (Run Locally)

1️⃣ Clone the Repository

2️⃣ Install Dependencies

3️⃣ Add API Keys (Secrets)

4️⃣ Run the Streamlit App

🖥️ How to Use the App

📌 Example Questions

🚀 Future Enhancements

⚠️ Important Notes

👤 Author

⭐ If You Find This Useful

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📑 FDA RAG Document Intelligence

🔍 Project Overview

✨ Key Features

🧠 What is RAG (Retrieval-Augmented Generation)?

🏗️ Architecture (High Level)

🧰 Tech Stack

📁 Project Structure

⚙️ Setup Instructions (Run Locally)

1️⃣ Clone the Repository

2️⃣ Install Dependencies

3️⃣ Add API Keys (Secrets)

4️⃣ Run the Streamlit App

🖥️ How to Use the App

📌 Example Questions

🚀 Future Enhancements

⚠️ Important Notes

👤 Author

⭐ If You Find This Useful

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages