Ayurveda RAG System

A specialized Retrieval-Augmented Generation (RAG) system designed for scholarly Ayurvedic texts. This project uses a standard chunk-based retrieval approach with multi-stage filtering to provide accurate answers tailored to complex medical and botanical queries.

Dataset

Dataset Used:

Kaggle: Ayurveda Texts (English)

Quick Start

The easiest way to run the entire system is using the provided scripts.

1. Prerequisites

Docker Desktop installed and running.
NVIDIA GPU with CUDA drivers (strongly recommended for performance). nvidia-smi should work on your host.

2. Run the Full Pipeline

Windows:

.\run.ps1

Linux / macOS (Bash):

chmod +x run.sh
./run.sh

These automated scripts handle the following:

Stops any existing containers.
Rebuilds the Docker images.
Starts the services (ollama, rag-app).
Runs Data Ingestion to process .txt files.
Launches the Interactive Chat automatically.

Adapting for Custom Datasets

To use this RAG system with your own data (only .txt files)

Replace Data: Delete files in data/source/ and add your own .txt files.
Update Prompt: Edit app/system_prompt.txt to change the AI's persona and logic.
Run: Execute run.ps1 or run.sh to re-ingest and chat.

(Optional) Using Google Gemini API

To use Google's Gemini Flash (1M context) instead of the local model:

Get an API Key from Google AI Studio.
Create a file named .env.local in the project root.
Add your key: GOOGLE_API_KEY=your_key_here.
Restart the application using .\run.ps1.

Manual Execution Guide

If you prefer to run the system manually step-by-step instead of using the automated scripts:

1. Start the Environment

Launch the Ollama and RAG containers in the background:

docker-compose up -d

2. Prepare the LLM

A. If using Local LLM (Ollama): Ensure the model is downloaded inside the Ollama container:

docker exec -it ollama ollama pull lfm2.5-thinking:1.2b

B. If using Google Gemini: Skip the step above and ensure your .env.local file contains your GOOGLE_API_KEY.

3. Run Data Ingestion

Process your .txt files from data/source/ into the vector database:

docker exec -it ayurveda-rag python app/ingestion.py

4. Launch the Chat

Start the RAG application to begin asking questions:

docker exec -it ayurveda-rag python app/main.py

Utility Commands

Action	Command
Stop Services	`docker-compose down`
View App Logs	`docker logs -f ayurveda-rag`
Check GPU Status	`docker exec -it ayurveda-rag nvidia-smi`
Container Shell	`docker exec -it ayurveda-rag /bin/bash`

System Architecture

LLM: lfm2.5-thinking:1.2b (served via Ollama)
- A specialized model capable of deep reasoning.
Embeddings: BGE-M3 (HuggingFace)
- Optimized for dense retrieval and multi-lingual capabilities.
- Runs in the rag-app Python container.
Vector Database: ChromaDB (Persistent)
- Stores document chunks and vectors locally in data/chroma_db.
Retrieval Pipeline:
1. Multi-Query Generation: Rewrites queries into 3 variants to catch different phrasings.
2. Vector Search: Retrieves top-k relevant chunks.
3. Contextual Compression: Uses an LLM/Filter to remove irrelevant context before generating the answer.

Data Ingestion

To add new knowledge to the system:

Place text files (.txt, clean UTF-8 text) into:
```
data/source/
```
Run the ingestion script manually (or use run.ps1 / run.sh):
```
docker exec -it ayurveda-rag python app/ingestion.py
```
Note: The system tracks processed files in data/processed/processed_files.json and skips files that have already been ingested.

Interactive Chat

To start asking questions to the RAG system:

docker exec -it ayurveda-rag python app/main.py

Troubleshooting

1. Docker Build Errors

If the build fails with image download errors:

docker builder prune -f
docker pull pytorch/pytorch:2.6.0-cuda12.4-cudnn9-runtime

2. Model not found Error

If the LLM fails to download automatically:

docker exec -it ollama ollama pull lfm2.5-thinking:1.2b

3. GPU Acceleration

The system automatically detects if a CUDA-capable GPU is available (via torch.cuda.is_available()).

GPU detected: It will run embeddings on the GPU (cuda).
No GPU: It gracefully falls back to cpu.

To verify the active device:

docker logs ayurveda-rag

Configuration

Key settings can be modified in app/config.py and app/system_prompt.txt:

File	Variable	Default	Description
`app/config.py`	`CHUNK_SIZE`	`1000`	Size of text chunks for indexing.
`app/config.py`	`CHUNK_OVERLAP`	`200`	Overlap between chunks to preserve context.
`app/config.py`	`OLLAMA_MODEL`	`lfm2.5-thinking:1.2b`	The LLM used for reasoning.
`app/config.py`	`EMBEDDING_MODEL_DOCS`	`BAAI/bge-m3`	The embedding model for vector search.
`app/system_prompt.txt`	System Prompt	Ayurvedic Expert	The AI's persona and instructions. Edit this file to change the bot's behavior.

Retrieval Details

The system uses an Advanced RAG Pipeline (defined in app/utils.py):

Multi-Query Retriever: Breaks down complex queries into sub-questions.
Ensemble Retrieval: Combines vector search with keyword search (MMR).
Contextual Compression: Filters irrelevant documentation before sending it to the LLM.

Note

The app comes bundled with lfm2.5-thinking:1.2b model. You can change the model by editing the app/config.py file.
In case of change of LLM's the context window size may also need to be changed in the app/config.py file. The current app is optimised for using with Google Gemini 2.5 Flash (1M context window).
Gemini Model can be changed in app/utils.py file.
Debugs have not been removed from the code as to see how the things are working. They can be removed by editing the app/utils.py file.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
run.ps1		run.ps1
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ayurveda RAG System

Dataset

Quick Start

1. Prerequisites

2. Run the Full Pipeline

Adapting for Custom Datasets

(Optional) Using Google Gemini API

Manual Execution Guide

1. Start the Environment

2. Prepare the LLM

3. Run Data Ingestion

4. Launch the Chat

Utility Commands

System Architecture

Data Ingestion

Interactive Chat

Troubleshooting

1. Docker Build Errors

2. Model not found Error

3. GPU Acceleration

Configuration

Retrieval Details

Note

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ayurveda RAG System

Dataset

Quick Start

1. Prerequisites

2. Run the Full Pipeline

Adapting for Custom Datasets

(Optional) Using Google Gemini API

Manual Execution Guide

1. Start the Environment

2. Prepare the LLM

3. Run Data Ingestion

4. Launch the Chat

Utility Commands

System Architecture

Data Ingestion

Interactive Chat

Troubleshooting

1. Docker Build Errors

2. Model not found Error

3. GPU Acceleration

Configuration

Retrieval Details

Note

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages