RAG Platform — Production-Grade LLM Retrieval System

End-to-end Retrieval-Augmented Generation (RAG) system for working with large collections of unstructured documents, reports, and long-form content.

This project focuses on production concerns rather than demos: retrieval quality, re-indexing workflows, and system-level design.

What this system does

Ingests and normalizes unstructured documents
Applies configurable chunking strategies based on content type
Generates embeddings and indexes them in a vector database
Performs semantic (and hybrid) retrieval with metadata filtering
Assembles context within model limits while preserving traceability
Streams LLM responses with document-level citations
Supports re-indexing and tuning without re-ingesting source data

Why this exists

Most RAG examples stop at simple prompt + vector search demos.

This project treats RAG as a system:

data pipelines
retrieval strategies
observability
iteration based on real usage

The goal is to build something that can evolve as models, data, and product requirements change.

High-level architecture

Ingestion
- File upload & normalization
- Document parsing and cleaning
- Chunking strategies tuned per document type
Embedding & Indexing
- Embedding generation
- Vector database storage
- Rich metadata for filtering and re-indexing
Retrieval
- Semantic search (optionally hybrid)
- Context window management
- Source-aware chunk selection
Generation
- LLM response streaming
- Citation attachment
- Token-aware context assembly

Tech stack

FastAPI (API layer)
PostgreSQL (source of truth, metadata)
Vector database (Qdrant / FAISS)
Background workers for ingestion & embedding
LLMs (OpenAI-compatible APIs or local models)

Design principles

RAG is a system, not a prompt
Data quality > prompt engineering
Retrieval quality is iterative
Re-indexing should be cheap
Observability matters

Status

This is an actively evolving project used to explore and validate production-grade RAG patterns.

Author

Built and maintained by Andrey Keske
Applied AI Engineer focused on RAG, embeddings, and semantic search.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Platform — Production-Grade LLM Retrieval System

What this system does

Why this exists

High-level architecture

Tech stack

Design principles

Status

Author

About

Uh oh!

Releases

Packages

keske/rag-ai-platform

Folders and files

Latest commit

History

Repository files navigation

RAG Platform — Production-Grade LLM Retrieval System

What this system does

Why this exists

High-level architecture

Tech stack

Design principles

Status

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages