AI-Powered Document Query Engine

An advanced domain-agnostic document analysis system built with FastAPI, LangGraph, and Qdrant.
Easily ingest documents, ask multiple natural language questions, and receive structured, verifiable answers — all in one API call.

✨ Key Highlights

📍 One API to Rule Them All – A single /process endpoint handles document ingestion + batch querying in one go.
🌎 Works Across Domains – Insurance, Legal, HR, Finance — the system detects the domain and extracts relevant entities automatically.
🔒 Isolated Vector Collections – Each job has its own Qdrant collection (jobId) for data segregation.
⚡ Parallel Question Processing – Handles multiple queries simultaneously for high throughput.
🧠 Advanced RAG Pipeline:
1. Retrieval – Get broad, relevant document chunks.
2. Reranking – Refine with a CrossEncoder for precise context.
3. Generation – Multi-persona LLM (Analyst + Auditor) produces structured answers with self-critique + confidence score.
📂 Multi-Format Support – Works with PDF, DOCX, and EML files.

🏗 Architecture

Powered by a LangGraph-based graph workflow:

FastAPI Server – Receives API requests and orchestrates the workflow.
LangGraph Workflow – A sequence of stateful nodes ensuring debuggable and maintainable execution.
Qdrant Vector DB – Stores vectorized document chunks per jobId.
LangChain + OpenAI – Handles question analysis, structured answer generation, and self-critique.
Sentence Transformers – Reranks retrieved documents for maximum relevance.
Document Parsers – PyMuPDF, python-docx, and mailparser extract clean text from various formats.

🔍 Workflow Overview

The /process endpoint triggers this sequence:

preprocess – Download, parse, and chunk the document.
batch_analyze_queries – Identify document domain + generate search queries for each question.
load_to_db – Store chunks in a job-specific Qdrant collection.
batch_retrieve_docs – Retrieve broad context for all questions.
batch_rerank_docs – Use CrossEncoder to create tailored context per question.
batch_generate_answers – Generate structured answers + self-critique.

📡 API Usage

POST `/process`

Ingest a document, process questions, and receive structured answers.

Request Body:

{
  "jobId": "string",
  "documents": "string (URL)",
  "questions": ["string"]
}

jobId – Unique identifier, used as Qdrant collection name.
documents – Public URL to PDF/DOCX/EML file.
questions – List of natural language questions.

Response:

{
  "answers": [
    {
      "decision": "string",
      "details": {},
      "justification": "string",
      "clauses": ["string"]
    }
  ]
}

decision – Short, direct outcome (e.g., "Approved").
details – Extracted facts (e.g., amount, waiting period).
justification – Step-by-step reasoning based on document.
clauses – Supporting excerpts or clause IDs.

⚙️ Setup

Prerequisites

Python 3.9+
Docker (for Qdrant)
OpenAI API key

1. Clone the Repository

git clone <your-repo-url>
cd <your-repo-name>

2. Install Dependencies

pip install -r requirements.txt

3. Configure Environment

Create .env:

OPENAI_API_KEY="sk-..."
QDRANT_URL="http://localhost:6333"
QDRANT_API_KEY=null
EMBEDDING_MODEL="text-embedding-3-small"

4. Run Qdrant (via Docker)

docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant

▶️ Run the Application

uvicorn main:app --reload

Access:

API – http://127.0.0.1:8000
Docs – http://127.0.0.1:8000/docs

💻 Tech Stack

Component	Technology
Backend	FastAPI
Workflow Orchestration	LangGraph
RAG/LLM Framework	LangChain
Vector Database	Qdrant
LLM Provider	OpenAI
Reranker	Sentence Transformers (CrossEncoder)
Document Parsing	PyMuPDF, python-docx, mailparser

🏆 Why This Matters

This engine isn't just a Q&A bot — it’s a scalable, auditable, and high-precision document understanding system that adapts to any industry, providing reliable and explainable answers at scale.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github/workflows		.github/workflows
app		app
datafile		datafile
qdrant_storage		qdrant_storage
redis_data		redis_data
.gitignore		.gitignore
README.md		README.md
check.py		check.py
docker-compose.yml		docker-compose.yml
init_setup.sh		init_setup.sh
main.py		main.py
multimodel_RAG_docs_multimodal.ipynb		multimodel_RAG_docs_multimodal.ipynb
out.txt		out.txt
requirements.txt		requirements.txt
setup.py		setup.py
test1.py		test1.py
test2_retrival.py		test2_retrival.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI-Powered Document Query Engine

✨ Key Highlights

🏗 Architecture

🔍 Workflow Overview

📡 API Usage

POST `/process`

⚙️ Setup

Prerequisites

1. Clone the Repository

2. Install Dependencies

3. Configure Environment

4. Run Qdrant (via Docker)

▶️ Run the Application

💻 Tech Stack

🏆 Why This Matters

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

rajeev-sr/Hack-RX

Folders and files

Latest commit

History

Repository files navigation

AI-Powered Document Query Engine

✨ Key Highlights

🏗 Architecture

🔍 Workflow Overview

📡 API Usage

POST /process

⚙️ Setup

Prerequisites

1. Clone the Repository

2. Install Dependencies

3. Configure Environment

4. Run Qdrant (via Docker)

▶️ Run the Application

💻 Tech Stack

🏆 Why This Matters

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

POST `/process`

Packages