A simple and fast local RAG chatbot built with Python, FAISS, and Ollama. It reads your personal documents (PDFs), finds the most relevant info, and gives clear answers using a local AI model. No API keys, no internet, everything runs on your machine.
I built this because I wanted a chatbot that could read my own data without needing cloud access or paid APIs. At first, I just wanted to understand how RAG really works, but then it became something I could actually use and show. This project helped me learn how retrieval and generation really connect in practice. It’s simple, fast, and works offline, which was exactly the goal.
- Reads and processes PDF files locally
- Converts text into vector embeddings using multilingual models
- Uses FAISS for fast and accurate similarity search
- Answers questions through local AI models on Ollama (Phi-3, Mistral, Gemma)
- Works fully offline
- Code is clean and modular, split into Retrieval and Generation parts
- Load a document
- Split the text into smaller chunks
- Convert each chunk into an embedding
- Store everything inside FAISS
- When you ask a question, it finds the most similar chunks
- Sends them to the local AI model and returns the answer
Two main sections:
- Retrieval – handles reading, embedding, and searching chunks
- Generation – builds the prompt and generates the final answer
Designed for clarity — simple enough to extend with Chroma or LangChain later.
python main.py
# Choose a PDF file
# Ask: What Is Pokemon?
# Bot: Pokémon are creatures that inhabit the world of the Pokémon universe. The core idea revolves around friendship, adventure, and growth, both for the Pokémon themselves and their trainers.
- Python
- The main programming language that connects every part of the system.
- FAISS
- Used for fast vector similarity search.
- Stores and retrieves embeddings efficiently.
- SentenceTransformers
- Converts text and questions into embeddings.
- Works well with multiple languages including Bahasa Indonesia.
- NumPy
- Handles numerical operations and converts embeddings into the right format for FAISS.
- PyPDF2
- Reads and extracts text from PDF files before they are processed.
- Requests
- Sends the formatted question and context to Ollama’s local API for generating responses.
- Ollama
- Runs the local AI models like Phi-3, Mistral, or Gemma.
- Generates the final answer directly on your machine.
- Tkinter
- Opens a simple file picker so you can select the document to analyze.
- Dotenv
- Keeps model names and settings clean and separate inside a .env file.
- A good embedding model changes everything
- FAISS makes search extremely fast
- Keeping Retrieval and Generation separate makes the code easier to manage
- You can build a real RAG chatbot without relying on APIs

