Skip to content

keske/rag-ai-platform

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

RAG Platform — Production-Grade LLM Retrieval System

End-to-end Retrieval-Augmented Generation (RAG) system for working with large collections of unstructured documents, reports, and long-form content.

This project focuses on production concerns rather than demos: retrieval quality, re-indexing workflows, and system-level design.


What this system does

  • Ingests and normalizes unstructured documents
  • Applies configurable chunking strategies based on content type
  • Generates embeddings and indexes them in a vector database
  • Performs semantic (and hybrid) retrieval with metadata filtering
  • Assembles context within model limits while preserving traceability
  • Streams LLM responses with document-level citations
  • Supports re-indexing and tuning without re-ingesting source data

Why this exists

Most RAG examples stop at simple prompt + vector search demos.

This project treats RAG as a system:

  • data pipelines
  • retrieval strategies
  • observability
  • iteration based on real usage

The goal is to build something that can evolve as models, data, and product requirements change.


High-level architecture

  1. Ingestion

    • File upload & normalization
    • Document parsing and cleaning
    • Chunking strategies tuned per document type
  2. Embedding & Indexing

    • Embedding generation
    • Vector database storage
    • Rich metadata for filtering and re-indexing
  3. Retrieval

    • Semantic search (optionally hybrid)
    • Context window management
    • Source-aware chunk selection
  4. Generation

    • LLM response streaming
    • Citation attachment
    • Token-aware context assembly

Tech stack

  • FastAPI (API layer)
  • PostgreSQL (source of truth, metadata)
  • Vector database (Qdrant / FAISS)
  • Background workers for ingestion & embedding
  • LLMs (OpenAI-compatible APIs or local models)

Design principles

  • RAG is a system, not a prompt
  • Data quality > prompt engineering
  • Retrieval quality is iterative
  • Re-indexing should be cheap
  • Observability matters

Status

This is an actively evolving project used to explore and validate production-grade RAG patterns.


Author

Built and maintained by Andrey Keske
Applied AI Engineer focused on RAG, embeddings, and semantic search.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published