A cloud-native, LLM-powered query-retrieval system designed to perform contextual analysis on large, unstructured documents. Built for scalable, production-ready deployments.
-
Retrieval-Augmented Generation (RAG) pipeline using:
faiss-cpufor efficient vector indexingsentence-transformersCrossEncoder for high-precision reranking
-
Llama 3 LLM integration via the Groq API for final answer synthesis
-
Fine-tuned role-based system prompt for domain-specific query responses
-
Lazy-loading pattern for large AI models to optimize memory and performance
-
FastAPI backend with REST endpoints for document ingestion and query answering
-
Docker-based deployment for reproducibility and scalability
The application is deployed on Railway and available here: API Documentation (Swagger UI)
├── main.py # FastAPI entry point
├── requirements.txt # Dependencies
├── Dockerfile # Container build setup
├── start.py / start.sh # Application startup scripts
├── test_*.py # API test scripts
└── deploy.sh / deploy.bat # Deployment scripts
-
Clone the repository
git clone https://github.com/yourusername/intelligent-document-query-engine.git cd intelligent-document-query-engine -
Install dependencies
pip install -r requirements.txt
-
Set environment variables Create a
.envfile with:GROQ_API_KEY=your_groq_api_key -
Run locally
uvicorn main:app --reload
This project supports Railway deployment (Docker-based):
railway upor locally via Docker:
docker build -t doc-query-engine .
docker run -p 8000:8000 doc-query-engine- Lazy loading of embedding and reranker models
- FAISS in-memory index creation for efficient retrieval
- Batched query processing for speed
- Role-based prompts to reduce token usage
- Backend: Python, FastAPI
- Vector Indexing: FAISS
- LLM API: Groq (Llama 3)
- Reranking: SentenceTransformers CrossEncoder
- Deployment: Docker, Railway