Skip to content

ShreyashAnarase/RAG-app

Repository files navigation

A Retrieval-Augmented Generation (RAG) system built with FastAPI, Kafka, background workers, and a vector database(CHROMA).

The system ingests documents, chunks them asynchronously, generates embeddings, and serves context-aware answers via an API.

Most common commands :

# Build and run locally
dkr_pvt build
dkr_pvt up -d

# EC2 server is linux/AMD and my machine is Apple Silicon(ARM64)
# so using Docker bake to build images on my machine
# and pushing them to Dockerhub
# Then pull them in EC2 and run. EC2 is only for running the containers, no build

### Prerequisites
- Docker ≥ 24
- Docker Buildx enabled
- Docker Hub account

# clean up Docker junk
docker system prune -af

# Build and push images for EC2 (Linux/AMD)
	📦 Build & Push Images to Docker Hub
	docker buildx bake --push

	# build and push individually
	docker buildx bake fastapi --push
	docker buildx bake chunker --push
	docker buildx bake embedder --push

# Pull images from Docker Hub and run in EC2
dkr_pvt pull

#pull individually  -> to avoid space limiatations
docker pull discreteflow/rag-agent:fastapi
docker pull discreteflow/rag-agent:chunker
docker pull discreteflow/rag-agent:embedder




# Created an alias dkr_pvt  (which overrides the docker-compose.yml with docker-compose.private.yml)

docker compose logs -f fastapi
docker compose logs -f chunker
docker compose logs -f embedder
docker compose logs -f kafka
docker compose logs -f zookeeper

Architecture overview

flowchart TD
    Client --> FastAPI["FastAPI API Layer"]
    FastAPI --> K1["Kafka topic: documents-uploaded"]
    K1 --> Chunker["Chunking Worker"]
    Chunker --> K2["Kafka topic: embedding-requests"]
    K2 --> Embedder["Embedding Worker"]
    Embedder --> Chroma["Vector Store (Chroma)"]
Loading

Key idea: document ingestion, chunking, and embedding are fully decoupled using Kafka so each stage can scale independently.


Components

FastAPI Service

  • Handles document uploads and chat requests
  • Publishes document events to Kafka
  • Performs retrieval + final LLM response generation

Chunking Worker

  • Consumes uploaded documents
  • Splits content into overlapping chunks
  • Publishes chunk events to Kafka

Embedding Worker

  • Consumes chunk events
  • Generates embeddings
  • Stores them in a persistent vector database

Kafka

  • Event backbone between services
  • Enables asynchronous, fault-tolerant processing

Vector Store

  • Chroma-based persistent embedding storage
  • Supports semantic similarity search for RAG

Commands

Build images locally , push to Docker Hub and pull in EC2 and run.

dkr_p
<!-- Build images locally (these are not compatible with linux for ec2 ) -->
dkr_pvt build

<!-- Run the container  -->

dkr_pvt down
dkr_pvt up -d

<!-- tag images for Docker Hub (One time) -->
docker tag ragagent-fastapi:latest discreteflow/rag-agent:fastapi
docker tag ragagent-chunker:latest discreteflow/rag-agent:chunker
docker tag ragagent-embedder:latest discreteflow/rag-agent:embedder


Running Locally (Public / Demo Mode)

This mode runs without external credentials and uses a demo LLM backend.

docker compose up --build

Running the APP


> [!NOTE]
> This app can be run in two ways
1. Docker Compose
2. Makefile

Private mode, override  docker config (With my private LLM config)
build the image
docker compose  -f docker-compose.yml  -f docker-compose.private.yml build
<!-- run the container  -->
docker compose -f docker-compose.yml -f docker-compose.private.yml up -d

<!-- Single cmd to start everything With docker  -->
docker compose up
    <!-- if changes in dockerfile -->
		docker compose up --build
		<!-- to run in background add -d   -->
		docker compose up -d

<!-- Check everything is running  -->
	docker compose ps

<!-- to stop -->
	docker compose down


<!-- See logs  -->
	docker compose logs -f
<!-- Specific service logs -->
	docker compose logs -f fastapi
	docker compose logs -f chunker
	docker compose logs -f embedder

Running with makefile

make start -> start kafka + api + workers
make private -> start everything in PRIVATE mode (OCI)
make stop -> stop everything cleanly

4 terminals Kafka , App , chunking, embedding

  1. start kafka Dir with quick-kafka.yml docker compose -f quick-kafka.yml up -d

Verify status docker compose -f quick-kafka.yml ps

  1. activate venv source .venv/bin/activate

-- Start app + workers


  1. Chunking worker python3 -m workers.chunking_worker
  2. Embedding worker python3 -m workers.embedding_worker
  3. FastAPI app uvicorn app.main:app --reload

All config parameters/ config_public.py -> public vars config_private.py -> pvt vars


Files used in the app


main.py calls Lchain.py -> kafka_producer.py -> kafka_consumer.py -> chunking_worker.py -> embedding_worker.py

kafka_producer.py -> A wrapper class To save messages to a Kafka topic kafka_consumer.py chunking_worker.py -> consumes from uploded_docs topic -> splits into chunks -> saves to embedding topic embedding_worker.py -> consumes from embed topic -> embeds chunk -> saves to Chroma vector DB

Frontend files - templates/ , /static/

templates/chatbox.html -> serves the UI for interacting with the RAG tool


upload → chunk → embed → store,

Manual .md file loading User uploads through UI via /upload-doc

One big process Multiple lightweight Kafka workers Blocking Async, streaming, real-time No separation of concerns Clearly defined responsibilities No retry/failure management Kafka lets you retry + log errors

Challenges Faced

Old Design -Doesnt scale well

Old Design
monolithic and synchronous:
	1.	Manually drop .md files into a templates/ folder.
	2.	Run main() to:
	•	Load all markdown files.
	•	Split them into chunks.
	•	Embed and store them in ChromaDB using Chroma.from_documents(...).

when using Local kafka - Unable to read from a Kafka topic without prociding the partition number =0 even in the command and so had to use manual partition to get_kafka_consumer method.

On running my fastapi app with uvicorn app.main:app --reload ERROR: [Errno 48] Address already in use

Why this keeps happening
	•	--reload spawns a watcher process
	•	If you Ctrl+C during a bad moment, the child survives
	•	macOS is especially good at leaving zombies

Local fix Kill proces runing on port 8000 lsof -i :8000 → kill -9 lsof -i :8000

Prod fix - Use gunicorn + uvicorn workers gunicorn -w 4 -k uvicorn.workers.UvicornWorker app.main:app

Dependency conflicts SOlved via trial and error Had to downgrade some packages to match - langchain core ,langchain chroma, transformers , torch , sentence transformers

Using heavy embedding model like cohere was bloating the images, choking space on EC2 Switched to a practical choice while not compromising on quality of results ----------------------------X----------------------------X----------------------------X

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors