This project is a complete, 100% local Retrieval-Augmented Generation (RAG) assistant.
It allows you to chat with your own documents without sending any data to external APIs. The assistant loads your text files, indexes them in a local vector database, and uses a local Large Language Model (LLM) to answer questions based only on the information found in your documents.
- LLM (Text Generation):
Qwen/Qwen2.5-3B-Instruct(running locally viatransformers) - Embedding Model:
sentence-transformers/all-MiniLM-L6-v2(running locally) - Vector Database:
ChromaDB(stores embeddings locally in achroma_dbfolder) - Orchestration:
LangChain(manages the RAG pipeline) - Hardware: Runs on commodity hardware (NVIDIA GPU or Apple Silicon) using 4-bit quantization.
This guide assumes you have an NVIDIA GPU and are using Ubuntu.
git clone https://github.com/pranav-wakode/local-rag-project.git
cd local-rag-projectThis project requires Python 3.11 to work correctly with all AI packages.
# Ensure you have Python 3.11 and its 'venv' module
sudo apt update
sudo apt install python3.11 python3.11-venv
# Create the virtual environment
python3.11 -m venv venv
# Activate the environment
source venv/bin/activate
(Your terminal prompt should now start with (venv))# Install all required packages
pip install -r requirements.txt(This will take a few minutes as it installs PyTorch and other large libraries.)
Place all your knowledge files (as .txt files) into the data/ folder.
With your virtual environment active, start the assistant:
python3 src/app.pyThe first time you run the app, it will:
- Download the models (Qwen and all-MiniLM-L6-v2). This is a one-time download and may take several gigabytes of space.
- Process your documents: It will read all files in data/, split them into chunks, and create embeddings.
- Create the chroma_db folder: This is where it saves the embeddings.
Once the setup is complete, you will see a prompt. You can now ask questions about your documents.
...
Successfully added 21 chunks to the vector database.
Enter a question or 'quit' to exit: What is AI?Subsequent runs will be much faster, as the app will re-use the models and the existing chroma_db database.