KIRA - Knowledge Interface Retrieval Agent - ist a local RAG system that allows you to chat with your documents using open source, locally run LLMs. Think of it as a small but self-hosted and private alternative to services like Google NotebookLM.
- Private & Local - No data leaves your machine, no need for API keys
- Multi-format Support - Supports .pdf and .txt files
- Open Source - Uses Mistral, can also use Llama 3.2 via Ollama
- Interactive Chat - Simple web-based UI built with Gradio
- Semantic Search - Find relevant information in your documents
- Python 3.8 or higher
- Ollama installed and running
- 8GB RAM at minimum, 16GB RAM recommended
- Clone this repository
git clone https://github.com/BVoermann/kira.git
cd kira- Create virtual environment and install requirements
Linux
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtWindows
python -m venv venv
source venv/Scripts/activate
pip install -r requirements.txt- Install and set up Ollama
Download from Ollama
Then either download llama3.2 or mistral.
ollama pull llama3.2ollama pull mistral- Start application
Linux
python3 app.pyWindows
python app.py- Open your Browser
Navigate to http://127.0.0.1:7860
- Upload Documents
- Select one or more PDF or TXT files
- Click "Process Documents" and wait for them to be processed
- Ask questions in the chat
- The AI will answer based on the content of the documents
local-rag-chat/
├── app.py # Main Gradio interface
├── document_processor.py # Document loading and vectorization
├── rag_engine.py # RAG query engine with LLM
└── chroma_db/ # Vector database storage (created on first run)
Edit app.py line 19:
rag_engine = RAGEngine(doc_processor.vectorstore, model_name="mistral")Edit document_processor.py lines 34-36:
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000, # Adjust this
chunk_overlap=200, # And this
length_function=len
)Edit document_processor.py line 12:
self.embeddings = HuggingFaceEmbeddings(
model_name="all-MiniLM-L6-v2" # Adjust this
)Make sure Ollama is running:
ollama list # Should show installed models- Reduce
chunk_sizeindocument_processor.py - Use a smaller model like
mistralinstead ofllama3.2 - Process fewer documents at once
- Use a smaller embedding model like
all-MiniLM-L6-v2 - Reduce the number of retrieved chunks (change
k=4inrag_engine.py)