Skip to content

a local, multilingual (EN/IT) study assistant that indexes course materials and answers questions with citations—using multilingual-e5-base for retrieval and Llama 3.1-8B for generation. CLI-only.

Notifications You must be signed in to change notification settings

taha-kms/CLASSMATE-RAG

Repository files navigation

CLASSMATE-RAG

A Retrieval-Augmented Generation (RAG) system for course materials. It ingests documents (PDF, DOCX, PPTX, EPUB, HTML, CSV, TXT, MD), indexes them in BM25 + Chroma vector DB, and answers questions with grounded citations using LLaMA/Mistral GGUF models.


✨ Features

  • CLI-first workflow (rag command)
  • Ingestion with metadata (course, unit, tags, language, semester, author)
  • Hybrid retrieval (BM25 keyword + vector embeddings, fused with RRF)
  • Cited answers generated with local LLMs
  • Admin tools: stats, preview, backup/restore, vacuum, rebuild embeddings, reingest
  • Document loaders: PDF, DOCX, PPTX, EPUB, HTML, CSV, TXT, Markdown
  • Multilingual support with E5 embeddings (intfloat/multilingual-e5-base)

📦 Installation

See docs/installation.md for details. Quick setup (Linux/macOS):

./quicksetup.sh
source .venv/bin/activate
rag --help

Windows (PowerShell):

.\quicksetup.ps1
.\.venv\Scripts\Activate.ps1
rag --help

🚀 Usage

Ingest a document:

rag add path/to/file.pdf --course "Math101" --unit "1" --language "en" --tags exam,week1

Ask a question:

rag ask "What is the chain rule?" --course "Math101"

Preview retrieval (no generation):

rag preview "Explain entropy"

See docs/usage.md for more.


🛠️ Maintenance

  • Show stats: rag stats
  • Backup: rag dump --path dumps/corpus.jsonl
  • Restore: rag restore --path dumps/corpus.jsonl
  • Vacuum: rag vacuum
  • Rebuild embeddings: rag rebuild --model intfloat/multilingual-e5-large
  • Manage entries: rag list, rag show, rag delete, rag reingest

Details in docs/configuration.md.


📖 Documentation


🧩 Project Structure

cli/           # CLI entrypoint
rag/           # Core RAG system
  admin/       # Backup, restore, manage, inspect
  chunking/    # Text splitting into chunks
  embeddings/  # Embedding models & cache
  generation/  # LLM runner, prompting, postprocessing
  loaders/     # File loaders
  retrieval/   # BM25, Chroma, hybrid fusion
  pipeline/    # Ingestion, query orchestration
docs/          # Documentation
tools/         # Benchmark scripts

About

a local, multilingual (EN/IT) study assistant that indexes course materials and answers questions with citations—using multilingual-e5-base for retrieval and Llama 3.1-8B for generation. CLI-only.

Topics

Resources

Stars

Watchers

Forks

Languages