Skip to content

mohamed-elkholy95/rag-document-qa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📄 RAG Document Q&A System

Retrieval-Augmented Generation for intelligent document question answering

Python Tests FastAPI Streamlit License

Overview

A production-ready RAG (Retrieval-Augmented Generation) system that answers questions over document collections using semantic search, intelligent chunking strategies, and context-aware retrieval.

Architecture

Documents → Chunking → Embedding → Vector Store → Retrieval → Context Assembly → Answer Generation

Features

  • 📑 Multiple Chunking Strategies — Fixed-size, sentence-based, and paragraph-level chunking
  • 🔍 Semantic Search — Embedding-based retrieval with configurable top-k results
  • 📊 Streamlit Dashboard — 4-page interactive UI for document management and Q&A
  • 🚀 REST API — Full FastAPI backend with health checks and query endpoints
  • Comprehensive Tests — 24 tests covering retrieval, ranking, and API endpoints
  • 🧠 Context Window Management — Intelligent context assembly with relevance scoring

Tech Stack

Component Technology
Backend Python 3.12, FastAPI, Uvicorn
NLP Embeddings, TF-IDF, Sentence Transformers
Frontend Streamlit with Plotly charts
Testing pytest, httpx
Data In-memory vector store

Quick Start

# Clone and setup
git clone https://github.com/mohamed-elkholy95/rag-document-qa.git
cd rag-document-qa

# Install dependencies
pip install -r requirements.txt

# Run tests
python -m pytest tests/ -v

# Start API server
python -m src.api.main

# Start Streamlit dashboard
streamlit run streamlit_app/app.py

Project Structure

├── src/                    # Source code
│   ├── api/main.py         # FastAPI endpoints
│   ├── config.py           # Configuration management
│   ├── chunker.py          # Document chunking strategies
│   ├── retriever.py        # Semantic search & ranking
│   └── generator.py        # Answer generation
├── streamlit_app/          # Interactive dashboard
│   ├── app.py              # Main app with navigation
│   └── pages/              # Multi-page dashboard (4 pages)
├── tests/                  # Test suite (24 tests)
│   ├── conftest.py
│   └── test_*.py
├── requirements.txt
└── README.md

API Endpoints

Method Endpoint Description
GET /health Health check
POST /query Submit a question and get an answer

Dashboard Pages

  1. 📊 Overview — System status and document statistics
  2. 📄 Upload — Upload and manage documents
  3. 💬 Ask — Interactive Q&A interface
  4. 📈 Analytics — Retrieval metrics and performance

Testing

# Run all tests
python -m pytest tests/ -v

# Run with coverage
python -m pytest tests/ --cov=src --cov-report=term-missing

Author

Mohamed ElkholyGitHub · melkholy@techmatrix.com


Built with Python, FastAPI, and Streamlit

About

RAG-based document Q&A system with chunking strategies, semantic search, and FastAPI + Streamlit

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages