Thesis RAG Assistant

Lightweight RAG pipeline for your PDFs using LangChain, FAISS, Sentence-Transformers, and Ollama for local LLM generation. Comes with a Streamlit UI and a tiny retrieval eval harness.

Project layout

thesis-rag-assistant/
  ├─ data/                  # PDFs to ingest
  ├─ storage/               # FAISS index + metadata
  ├─ src/
  │   ├─ ingest.py          # load → chunk → embed → index
  │   ├─ rag.py             # retrieval + generation (Ollama)
  │   └─ app.py             # Streamlit UI
  ├─ eval/
  │   ├─ qa_seed.jsonl      # seed Q/A pairs
  │   └─ evaluate.py        # simple retrieval hit-rate eval
  ├─ .env.example
  ├─ requirements.txt
  └─ README.md

Prerequisites

Python 3.10+
Ollama installed and a model available (default: llama3)

Install Ollama

curl -fsSL https://ollama.com/install.sh | sh
# Start service or run manually
sudo systemctl enable --now ollama || true
# or
ollama serve

ollama pull llama3
ollama run llama3 "Hello"

Setup

python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# optional: edit .env to change OLLAMA_MODEL or paths

Ingest PDFs → FAISS

Put your PDFs under data/.

python src/ingest.py

Defaults:

Embeddings: sentence-transformers/all-MiniLM-L6-v2
Chunking: 2000 chars, 300 overlap
Vector store: FAISS at storage/

Run Streamlit UI

streamlit run src/app.py

Ask a question, view the generated answer and the retrieved context with sources [source: page].

Programmatic usage

from rag import answer

text, docs = answer("What is in these documents?")
print(text)

Evaluation

Seed questions live in eval/qa_seed.jsonl as JSONL with keys question and answer.

python eval/evaluate.py
# Example output: Retrieval hit-rate: 9/12 = 0.75

Configuration

Set via .env:

OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3
DATA_DIR=data
STORE_DIR=storage

Notes

If Ollama’s systemd unit isn’t present, run ollama serve in a separate terminal.
For different models: ollama pull mistral, then set OLLAMA_MODEL=mistral.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Thesis RAG Assistant

Project layout

Prerequisites

Install Ollama

Setup

Ingest PDFs → FAISS

Run Streamlit UI

Programmatic usage

Evaluation

Configuration

Notes

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
eval		eval
src		src
storage		storage
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

rahadi94/thesis-rag-assistant

Folders and files

Latest commit

History

Repository files navigation

Thesis RAG Assistant

Project layout

Prerequisites

Install Ollama

Setup

Ingest PDFs → FAISS

Run Streamlit UI

Programmatic usage

Evaluation

Configuration

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages