Local RAG Document Chat

KIRA - Knowledge Interface Retrieval Agent - ist a local RAG system that allows you to chat with your documents using open source, locally run LLMs. Think of it as a small but self-hosted and private alternative to services like Google NotebookLM.

Features

Private & Local - No data leaves your machine, no need for API keys
Multi-format Support - Supports .pdf and .txt files
Open Source - Uses Mistral, can also use Llama 3.2 via Ollama
Interactive Chat - Simple web-based UI built with Gradio
Semantic Search - Find relevant information in your documents

Prerequisites

Python 3.8 or higher
Ollama installed and running
8GB RAM at minimum, 16GB RAM recommended

Installation

Clone this repository

git clone https://github.com/BVoermann/kira.git
cd kira

Create virtual environment and install requirements

Linux

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Windows

python -m venv venv
source venv/Scripts/activate
pip install -r requirements.txt

Install and set up Ollama

Download from Ollama

Then either download llama3.2 or mistral.

ollama pull llama3.2

ollama pull mistral

Usage

Start application

Linux

python3 app.py

Windows

python app.py

Open your Browser

Navigate to http://127.0.0.1:7860

Upload Documents

Select one or more PDF or TXT files
Click "Process Documents" and wait for them to be processed
Ask questions in the chat
The AI will answer based on the content of the documents

Project Structure

local-rag-chat/
├── app.py                    # Main Gradio interface
├── document_processor.py     # Document loading and vectorization
├── rag_engine.py            # RAG query engine with LLM
└── chroma_db/               # Vector database storage (created on first run)

Configuration

Change the LLM Model

Edit app.py line 19:

rag_engine = RAGEngine(doc_processor.vectorstore, model_name="mistral")

Adjust Chunk Size

Edit document_processor.py lines 34-36:

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,      # Adjust this
    chunk_overlap=200,    # And this
    length_function=len
)

Change Embedding Model

Edit document_processor.py line 12:

self.embeddings = HuggingFaceEmbeddings(
    model_name="all-MiniLM-L6-v2"  # Adjust this
)

Troubleshooting

Ollama Connection Error

Make sure Ollama is running:

ollama list  # Should show installed models

Memory Issues

Reduce chunk_size in document_processor.py
Use a smaller model like mistral instead of llama3.2
Process fewer documents at once

Slow Performance

Use a smaller embedding model like all-MiniLM-L6-v2
Reduce the number of retrieved chunks (change k=4 in rag_engine.py)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gitignore		.gitignore
README.md		README.md
app.py		app.py
document_processor.py		document_processor.py
rag_engine.py		rag_engine.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local RAG Document Chat

Features

Prerequisites

Installation

Usage

Project Structure

Configuration

Change the LLM Model

Adjust Chunk Size

Change Embedding Model

Troubleshooting

Ollama Connection Error

Memory Issues

Slow Performance

About

Uh oh!

Releases

Packages

Languages

BVoermann/kira

Folders and files

Latest commit

History

Repository files navigation

Local RAG Document Chat

Features

Prerequisites

Installation

Usage

Project Structure

Configuration

Change the LLM Model

Adjust Chunk Size

Change Embedding Model

Troubleshooting

Ollama Connection Error

Memory Issues

Slow Performance

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages