Agentic RAG System

A production-ready Agentic RAG system that autonomously improves retrieval through multi-iteration query rewriting, validates generated answers using hallucination detection, and orchestrates complex decision flows via LangGraph’s state-based architecture—going beyond traditional linear RAG pipelines.The system supports streaming responses, includes comprehensive evaluation metrics, and is fully containerized with Docker for scalable deployment.

Features

Intelligent Query Routing: Classifies queries and routes to appropriate processing paths
Self-Correcting Retrieval: Grades document relevance and rewrites queries when needed (up to 3 attempts)
Hallucination Detection: Validates generated answers against retrieved context
Streaming Responses: Real-time response streaming via Server-Sent Events (SSE)
RAG Evaluation: Built-in metrics for faithfulness, relevance, precision, and recall
Production Ready: Comprehensive testing, Docker support, CI/CD pipeline, structured logging
Cloud Deployed: Fully containerized and deployed on Google Cloud Run with authentication

Deployment Details:

Deployed on Google Cloud Run (Authenticated)
Docker containerized for scalable, serverless execution
Auto-scaling: 0-2 instances based on traffic
Production-ready with structured logging and monitoring

Want to Test It Yourself?

Option 1: Deploy Your Own Instance

# Clone and deploy to your own Cloud Run (requires gcloud CLI)
git clone <repository-url>
cd agentic-rag
gcloud run deploy agentic-rag --source .

Option 2: Run Locally See the Quick Start section below for local setup instructions.

Architecture

┌──────────────────────────────────┐
│   LangGraph (Flow Control)       │  ← StateGraph, conditional edges, nodes
├──────────────────────────────────┤
│   LangChain (Logic Layer)        │  ← Prompts, LLM chains, retrievers
├──────────────────────────────────┤
│   Infrastructure (DB, LLM, APIs) │  ← ChromaDB, Gemini, FastAPI
└──────────────────────────────────┘

Workflow

START → Router → Retriever → Grader → [Decision]
                                         │
                    ┌────────────────────┴────────────────────┐
                    ▼                                         ▼
              (docs relevant)                          (docs not relevant)
                    │                                         │
                    ▼                                         ▼
               Generator                              Query Rewriter
                    │                                         │
                    ▼                                         └──→ back to Retriever
           Hallucination Check                                    (max 3 times)
                    │
                    ▼
                  END

Quick Start

Running Locally

Prerequisites:

Python 3.11+
Google API Key (for Gemini)

Installation:

Clone the repository:

git clone <repository-url>
cd agentic-rag

Create virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Configure environment:

cp .env.example .env
# Edit .env and add your GOOGLE_API_KEY

Running the Application:

Development mode:

uvicorn app.api.main:app --reload

Using Docker:

docker-compose up --build

API Endpoints

Endpoint	Method	Description
`/health`	GET	Health check
`/ingest/file`	POST	Ingest single document (PDF, TXT, MD)
`/ingest/directory`	POST	Ingest all documents from directory
`/query`	POST	Query the system
`/query/stream`	POST	Query with streaming response (SSE)
`/collection/stats`	GET	Get collection statistics
`/collection/documents`	GET	Get list of documents in collection
`/collection/document/{document_name}`	DELETE	Delete specific document
`/collection`	DELETE	Clear all documents
`/graph/visualization`	GET	Get Mermaid diagram of workflow

API Usage Examples

Query the system:

curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{"query": "How is AI used in healthcare?"}'

Upload a document:

curl -X POST "http://localhost:8000/ingest/file" \
  -F "file=@document.pdf"

Stream response:

curl -X POST "http://localhost:8000/query/stream" \
  -H "Content-Type: application/json" \
  -d '{"query": "Explain the revenue trends"}'

Testing

Marker	Purpose	Run Command
----------
`unit`	Component-level tests	`pytest -m unit`
`integration`	Chain/graph flow tests	`pytest -m integration`
`e2e`	Full pipeline tests	`pytest -m e2e`
`evaluation`	RAG metrics tests	`pytest -m evaluation`

# Run all tests
pytest

# Run with coverage
pytest --cov=app --cov-report=html

Project Structure

agentic-rag/
├── app/
│   ├── agents/          # LangGraph workflow
│   │   ├── graph.py     # Workflow definition
│   │   ├── state.py     # State schema
│   │   └── nodes.py     # Node functions
│   ├── chains/          # LangChain components
│   │   ├── router.py    # Query routing
│   │   ├── grader.py    # Document grading
│   │   ├── generator.py # Answer generation
│   │   └── rewriter.py  # Query rewriting
│   ├── retrieval/       # Vector store
│   ├── api/             # FastAPI endpoints
│   └── config.py        # Configuration
├── tests/
│   ├── unit/            # Unit tests
│   ├── integration/     # Integration tests
│   ├── e2e/             # End-to-end tests
│   └── evaluation/      # RAG evaluation tests
├── evaluation/          # Evaluation datasets
├── docker-compose.yml
├── Dockerfile
└── requirements.txt

Configuration

Variable	Default	Description
`GOOGLE_API_KEY`	Required	Google API key for Gemini
`LLM_MODEL`	`gemini-1.5-flash`	LLM model to use
`RETRIEVAL_K`	`4`	Number of documents to retrieve
`MAX_REWRITE_ITERATIONS`	`3`	Max query rewrite attempts

RAG Evaluation

The system includes built-in evaluation metrics:

Metric	Description
Faithfulness	Is the answer grounded in the retrieved context?
Answer Relevance	Does the answer address the user's query?
Context Precision	Are the retrieved documents relevant?
Context Recall	Did we retrieve all important documents?

CI/CD Pipeline

The project includes a comprehensive GitHub Actions pipeline:

✅ Linting (Ruff, Black, isort)
✅ Unit Tests
✅ Integration Tests
✅ E2E Tests
✅ RAG Evaluation Tests
✅ Code Coverage
✅ Docker Build
✅ Security Scan

Tech Stack

Component	Technology
Orchestration	LangGraph
LLM Framework	LangChain
Vector Store	ChromaDB
LLM	Google Gemini
API	FastAPI
Testing	pytest, ragas
CI/CD	GitHub Actions
Containerization	Docker

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
app		app
data		data
evaluation		evaluation
frontend		frontend
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
check_models.py		check_models.py
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Agentic RAG System

Features

Want to Test It Yourself?

Architecture

Workflow

Quick Start

Running Locally

API Endpoints

API Usage Examples

Testing

Project Structure

Configuration

RAG Evaluation

CI/CD Pipeline

Tech Stack

License

About

Uh oh!

Releases

Packages

Languages

ShafqaatMalik/agentic_rag_system

Folders and files

Latest commit

History

Repository files navigation

Agentic RAG System

Features

Want to Test It Yourself?

Architecture

Workflow

Quick Start

Running Locally

API Endpoints

API Usage Examples

Testing

Project Structure

Configuration

RAG Evaluation

CI/CD Pipeline

Tech Stack

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages