A specialized Retrieval-Augmented Generation (RAG) system designed for scholarly Ayurvedic texts. This project uses a standard chunk-based retrieval approach with multi-stage filtering to provide accurate answers tailored to complex medical and botanical queries.
Dataset Used:
- Kaggle: Ayurveda Texts (English)
The easiest way to run the entire system is using the provided scripts.
- Docker Desktop installed and running.
- NVIDIA GPU with CUDA drivers (strongly recommended for performance).
nvidia-smishould work on your host.
Windows:
.\run.ps1Linux / macOS (Bash):
chmod +x run.sh
./run.shThese automated scripts handle the following:
- Stops any existing containers.
- Rebuilds the Docker images.
- Starts the services (
ollama,rag-app). - Runs Data Ingestion to process
.txtfiles. - Launches the Interactive Chat automatically.
To use this RAG system with your own data (only .txt files)
- Replace Data: Delete files in
data/source/and add your own.txtfiles. - Update Prompt: Edit
app/system_prompt.txtto change the AI's persona and logic. - Run: Execute
run.ps1orrun.shto re-ingest and chat.
To use Google's Gemini Flash (1M context) instead of the local model:
- Get an API Key from Google AI Studio.
- Create a file named
.env.localin the project root. - Add your key:
GOOGLE_API_KEY=your_key_here. - Restart the application using
.\run.ps1.
If you prefer to run the system manually step-by-step instead of using the automated scripts:
Launch the Ollama and RAG containers in the background:
docker-compose up -dA. If using Local LLM (Ollama): Ensure the model is downloaded inside the Ollama container:
docker exec -it ollama ollama pull lfm2.5-thinking:1.2bB. If using Google Gemini:
Skip the step above and ensure your .env.local file contains your GOOGLE_API_KEY.
Process your .txt files from data/source/ into the vector database:
docker exec -it ayurveda-rag python app/ingestion.pyStart the RAG application to begin asking questions:
docker exec -it ayurveda-rag python app/main.py| Action | Command |
|---|---|
| Stop Services | docker-compose down |
| View App Logs | docker logs -f ayurveda-rag |
| Check GPU Status | docker exec -it ayurveda-rag nvidia-smi |
| Container Shell | docker exec -it ayurveda-rag /bin/bash |
- LLM:
lfm2.5-thinking:1.2b(served via Ollama)- A specialized model capable of deep reasoning.
- Embeddings:
BGE-M3(HuggingFace)- Optimized for dense retrieval and multi-lingual capabilities.
- Runs in the
rag-appPython container.
- Vector Database: ChromaDB (Persistent)
- Stores document chunks and vectors locally in
data/chroma_db.
- Stores document chunks and vectors locally in
- Retrieval Pipeline:
-
Multi-Query Generation: Rewrites queries into 3 variants to catch different phrasings.
-
Vector Search: Retrieves top-k relevant chunks.
-
Contextual Compression: Uses an LLM/Filter to remove irrelevant context before generating the answer.
-
To add new knowledge to the system:
-
Place text files (
.txt, clean UTF-8 text) into:data/source/ -
Run the ingestion script manually (or use
run.ps1/run.sh):docker exec -it ayurveda-rag python app/ingestion.pyNote: The system tracks processed files in
data/processed/processed_files.jsonand skips files that have already been ingested.
To start asking questions to the RAG system:
docker exec -it ayurveda-rag python app/main.pyIf the build fails with image download errors:
docker builder prune -f
docker pull pytorch/pytorch:2.6.0-cuda12.4-cudnn9-runtimeIf the LLM fails to download automatically:
docker exec -it ollama ollama pull lfm2.5-thinking:1.2bThe system automatically detects if a CUDA-capable GPU is available (via torch.cuda.is_available()).
- GPU detected: It will run embeddings on the GPU (
cuda). - No GPU: It gracefully falls back to
cpu.
To verify the active device:
docker logs ayurveda-ragKey settings can be modified in app/config.py and app/system_prompt.txt:
| File | Variable | Default | Description |
|---|---|---|---|
app/config.py |
CHUNK_SIZE |
1000 |
Size of text chunks for indexing. |
app/config.py |
CHUNK_OVERLAP |
200 |
Overlap between chunks to preserve context. |
app/config.py |
OLLAMA_MODEL |
lfm2.5-thinking:1.2b |
The LLM used for reasoning. |
app/config.py |
EMBEDDING_MODEL_DOCS |
BAAI/bge-m3 |
The embedding model for vector search. |
app/system_prompt.txt |
System Prompt | Ayurvedic Expert | The AI's persona and instructions. Edit this file to change the bot's behavior. |
The system uses an Advanced RAG Pipeline (defined in app/utils.py):
- Multi-Query Retriever: Breaks down complex queries into sub-questions.
- Ensemble Retrieval: Combines vector search with keyword search (MMR).
- Contextual Compression: Filters irrelevant documentation before sending it to the LLM.
-
The app comes bundled with lfm2.5-thinking:1.2b model. You can change the model by editing the
app/config.pyfile. -
In case of change of LLM's the context window size may also need to be changed in the
app/config.pyfile. The current app is optimised for using with Google Gemini 2.5 Flash (1M context window). -
Gemini Model can be changed in
app/utils.pyfile. -
Debugs have not been removed from the code as to see how the things are working. They can be removed by editing the
app/utils.pyfile.