Skip to content

tungedng2710/DMOM-RAG

Repository files navigation

DMOM Chatbot

Official Code for the Paper VPHQA_ VIETNAMESE PREGNANCY HEALTH QUESTION ANSWERING DATASET

This project builds a Retrieval Augmented Generation (RAG) pipeline over the tungedng2710/Dmom_dataset dataset using a local vector database (Chroma) and an Ollama-served LLM.

Demo Chat Demo

Key choices:

  • Vector DB: Chroma (local persistent store)
  • Embeddings: via Ollama /api/embeddings (default: bge-m3:latest)
  • Generator: selectable — Ollama /api/chat, Google Gemini REST, or Cerebras Cloud
  • Dataset: tungedng2710/Dmom_dataset (fetched with datasets)

Prerequisites

  • Python 3.9+
  • pip
  • Ollama installed and running, reachable at http://localhost:7860

Install Ollama

  • Linux:
    • curl -fsSL https://ollama.com/install.sh | sh
    • Start on port 7860: export OLLAMA_HOST=0.0.0.0:7860 && ollama serve
  • macOS:
    • brew install ollama (or download the app from ollama.com)
    • Start on port 7860: export OLLAMA_HOST=0.0.0.0:7860 && ollama serve
  • Windows:
    • Install the Ollama app from ollama.com or via winget install Ollama.Ollama
    • Run it once, then in a terminal: setx OLLAMA_HOST 0.0.0.0:7860 and restart Ollama (or use WSL with the Linux command above).
  • Docker (optional):
    • docker run -d --name ollama -p 7860:11434 -v ollama:/root/.ollama ollama/ollama
    • The API is then available at http://localhost:7860 (mapped to container’s 11434).

Verify Ollama

  • curl http://localhost:7860/api/tags should return JSON (empty if no models yet).

Pull required models

  • Generation (Ollama): ollama pull gpt-oss:20b
  • Embeddings (default bge-m3:latest):
    • ollama pull bge-m3:latest

Install

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Configure (optional)

  • Copy .env.example to .env and adjust values.
  • Defaults:
    • OLLAMA_BASE_URL=http://localhost:7860
    • GENERATION_MODEL=gpt-oss:20b
    • EMBEDDING_MODEL=bge-m3:latest
    • CHROMA_DIR=./data/chroma
    • CHAT_BACKEND=ollama (set to gemini or cerebras to switch cloud providers)
    • For Gemini: set GEMINI_API_KEY and optionally GEMINI_MODEL (e.g., gemini-1.5-flash)
    • For Cerebras: set CEREBRAS_API_KEY and optionally CEREBRAS_MODEL (e.g., llama-4-scout-17b-16e-instruct)

Commands

  1. Ingest from local CSV (recommended for dmom_data.csv)
python -m tonrag.cli ingest \
  --csv data/dmom_data.csv \
  --text-field Reference \
  --id-field no \
  --chunk-size 800 \
  --chunk-overlap 120
  1. Ingest from Hugging Face dataset
python -m tonrag.cli ingest \
  --dataset tungedng2710/Dmom_dataset \
  --split train \
  --text-field context \
  --id-field id \
  --chunk-size 800 \
  --chunk-overlap 120

Notes:

  • If you are unsure of field names, run:
    • CSV: python -m tonrag.cli inspect --csv data/dmom_data.csv
    • HF: python -m tonrag.cli inspect --dataset tungedng2710/Dmom_dataset --split train
  • The script tries a few sensible defaults and will suggest possible column names.
  1. Answer a single question
python -m tonrag.cli query \
  --question "<your question>" \
  --top-k 5

Use Gemini or Cerebras instead of Ollama for this run:

python -m tonrag.cli query --question "<your question>" --llm gemini
python -m tonrag.cli query --question "<your question>" --llm cerebras
  1. Evaluate on the dataset (quick lexical match)
# CSV example
python -m tonrag.cli eval \
  --csv data/dmom_data.csv \
  --question-field instruction \
  --answer-field output \
  --top-k 5 \
  --limit 50

# HF example
python -m tonrag.cli eval \
  --dataset tungedng2710/Dmom_dataset \
  --split validation \
  --question-field question \
  --answer-field answer \
  --top-k 5 \
  --limit 50

Project Structure

  • tonrag/config.py – environment/config defaults
  • tonrag/embeddings.py – Ollama embeddings client; ST fallback if available
  • tonrag/llm.py – Ollama chat client (non-streaming)
  • tonrag/vectorstore.py – Chroma wrapper
  • tonrag/dataset.py – dataset utilities and column auto-detection
  • tonrag/chunking.py – simple text chunker
  • tonrag/rag.py – retrieval + prompt assembly + generation
  • tonrag/cli.py – CLI entry points (ingest/query/eval/inspect)
  • app/server.py – minimal web app (stdlib) serving app/static/ and /api/chat
  • app/static/ – frontend assets (index.html, style.css, app.js)

Troubleshooting

  • If embeddings fail, ensure the embedding model is pulled and available in Ollama: ollama pull bge-m3:latest.
  • If generation fails, ensure gpt-oss:20b is available: ollama pull gpt-oss:20b.
  • If dataset download fails, ensure your environment has internet access and the datasets package can reach Hugging Face.

Web App

  • FastAPI server (recommended): uvicorn app.main:app --host 0.0.0.0 --port 7865
  • Open: http://localhost:7865
  • Endpoints:
    • GET / – serves UI
    • POST /api/chat – body: { "message": "...", "top_k": 5, "llm": "ollama|gemini|cerebras", "llm_api_key": "optional" }
    • GET /health
  • The app uses the same RAG pipeline and Chroma store.

Legacy stdlib server (optional): python app/server.py --port 7865

Chat UI

  • Clean bubble layout with avatars, typing indicator, and copy buttons.
  • Collapsible “Contexts” under each assistant message to inspect retrieved sources.
  • Shift+Enter for newline, Enter to send, clear chat, adjustable Top‑K.
  • Answers render Markdown (headings, lists, links, code blocks) safely in the UI.
  • CLI: add --strip-markdown to print plain‑text answers.
  • LLM selector in the header to switch between Ollama, Gemini, or Cerebras per request.

License

  • For this template, no explicit license is added; adapt as needed.

About

Simple RAG

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •