Skip to content

BIJJUDAMA/RAG-app-with-Langchain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ayurveda RAG System

A specialized Retrieval-Augmented Generation (RAG) system designed for scholarly Ayurvedic texts. This project uses a standard chunk-based retrieval approach with multi-stage filtering to provide accurate answers tailored to complex medical and botanical queries.


Dataset

Dataset Used:


Quick Start

The easiest way to run the entire system is using the provided scripts.

1. Prerequisites

  • Docker Desktop installed and running.
  • NVIDIA GPU with CUDA drivers (strongly recommended for performance). nvidia-smi should work on your host.

2. Run the Full Pipeline

Windows:

.\run.ps1

Linux / macOS (Bash):

chmod +x run.sh
./run.sh

These automated scripts handle the following:

  1. Stops any existing containers.
  2. Rebuilds the Docker images.
  3. Starts the services (ollama, rag-app).
  4. Runs Data Ingestion to process .txt files.
  5. Launches the Interactive Chat automatically.

Adapting for Custom Datasets

To use this RAG system with your own data (only .txt files)

  1. Replace Data: Delete files in data/source/ and add your own .txt files.
  2. Update Prompt: Edit app/system_prompt.txt to change the AI's persona and logic.
  3. Run: Execute run.ps1 or run.sh to re-ingest and chat.

(Optional) Using Google Gemini API

To use Google's Gemini Flash (1M context) instead of the local model:

  1. Get an API Key from Google AI Studio.
  2. Create a file named .env.local in the project root.
  3. Add your key: GOOGLE_API_KEY=your_key_here.
  4. Restart the application using .\run.ps1.

Manual Execution Guide

If you prefer to run the system manually step-by-step instead of using the automated scripts:

1. Start the Environment

Launch the Ollama and RAG containers in the background:

docker-compose up -d

2. Prepare the LLM

A. If using Local LLM (Ollama): Ensure the model is downloaded inside the Ollama container:

docker exec -it ollama ollama pull lfm2.5-thinking:1.2b

B. If using Google Gemini: Skip the step above and ensure your .env.local file contains your GOOGLE_API_KEY.

3. Run Data Ingestion

Process your .txt files from data/source/ into the vector database:

docker exec -it ayurveda-rag python app/ingestion.py

4. Launch the Chat

Start the RAG application to begin asking questions:

docker exec -it ayurveda-rag python app/main.py

Utility Commands

Action Command
Stop Services docker-compose down
View App Logs docker logs -f ayurveda-rag
Check GPU Status docker exec -it ayurveda-rag nvidia-smi
Container Shell docker exec -it ayurveda-rag /bin/bash

System Architecture

  • LLM: lfm2.5-thinking:1.2b (served via Ollama)
    • A specialized model capable of deep reasoning.
  • Embeddings: BGE-M3 (HuggingFace)
    • Optimized for dense retrieval and multi-lingual capabilities.
    • Runs in the rag-app Python container.
  • Vector Database: ChromaDB (Persistent)
    • Stores document chunks and vectors locally in data/chroma_db.
  • Retrieval Pipeline:
    1. Multi-Query Generation: Rewrites queries into 3 variants to catch different phrasings.

    2. Vector Search: Retrieves top-k relevant chunks.

    3. Contextual Compression: Uses an LLM/Filter to remove irrelevant context before generating the answer.


Data Ingestion

To add new knowledge to the system:

  1. Place text files (.txt, clean UTF-8 text) into:

    data/source/
    
  2. Run the ingestion script manually (or use run.ps1 / run.sh):

    docker exec -it ayurveda-rag python app/ingestion.py

    Note: The system tracks processed files in data/processed/processed_files.json and skips files that have already been ingested.


Interactive Chat

To start asking questions to the RAG system:

docker exec -it ayurveda-rag python app/main.py

Troubleshooting

1. Docker Build Errors

If the build fails with image download errors:

docker builder prune -f
docker pull pytorch/pytorch:2.6.0-cuda12.4-cudnn9-runtime

2. Model not found Error

If the LLM fails to download automatically:

docker exec -it ollama ollama pull lfm2.5-thinking:1.2b

3. GPU Acceleration

The system automatically detects if a CUDA-capable GPU is available (via torch.cuda.is_available()).

  • GPU detected: It will run embeddings on the GPU (cuda).
  • No GPU: It gracefully falls back to cpu.

To verify the active device:

docker logs ayurveda-rag

Configuration

Key settings can be modified in app/config.py and app/system_prompt.txt:

File Variable Default Description
app/config.py CHUNK_SIZE 1000 Size of text chunks for indexing.
app/config.py CHUNK_OVERLAP 200 Overlap between chunks to preserve context.
app/config.py OLLAMA_MODEL lfm2.5-thinking:1.2b The LLM used for reasoning.
app/config.py EMBEDDING_MODEL_DOCS BAAI/bge-m3 The embedding model for vector search.
app/system_prompt.txt System Prompt Ayurvedic Expert The AI's persona and instructions. Edit this file to change the bot's behavior.

Retrieval Details

The system uses an Advanced RAG Pipeline (defined in app/utils.py):

  1. Multi-Query Retriever: Breaks down complex queries into sub-questions.
  2. Ensemble Retrieval: Combines vector search with keyword search (MMR).
  3. Contextual Compression: Filters irrelevant documentation before sending it to the LLM.

Note

  • The app comes bundled with lfm2.5-thinking:1.2b model. You can change the model by editing the app/config.py file.

  • In case of change of LLM's the context window size may also need to be changed in the app/config.py file. The current app is optimised for using with Google Gemini 2.5 Flash (1M context window).

  • Gemini Model can be changed in app/utils.py file.

  • Debugs have not been removed from the code as to see how the things are working. They can be removed by editing the app/utils.py file.

About

A domain-specific Retrieval-Augmented Generation (RAG) system engineered for scholarly Ayurvedic literature. This project implements a chunk-based semantic retrieval pipeline combined with multi-stage filtering and reranking to deliver precise, context-aware responses for complex medical, botanical, and classical Ayurvedic queries.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors