Official Code for the Paper VPHQA_ VIETNAMESE PREGNANCY HEALTH QUESTION ANSWERING DATASET
This project builds a Retrieval Augmented Generation (RAG) pipeline over the tungedng2710/Dmom_dataset dataset using a local vector database (Chroma) and an Ollama-served LLM.
Key choices:
- Vector DB: Chroma (local persistent store)
- Embeddings: via Ollama
/api/embeddings(default:bge-m3:latest) - Generator: selectable — Ollama
/api/chat, Google Gemini REST, or Cerebras Cloud - Dataset:
tungedng2710/Dmom_dataset(fetched withdatasets)
Prerequisites
- Python 3.9+
pip- Ollama installed and running, reachable at
http://localhost:7860
Install Ollama
- Linux:
curl -fsSL https://ollama.com/install.sh | sh- Start on port 7860:
export OLLAMA_HOST=0.0.0.0:7860 && ollama serve
- macOS:
brew install ollama(or download the app from ollama.com)- Start on port 7860:
export OLLAMA_HOST=0.0.0.0:7860 && ollama serve
- Windows:
- Install the Ollama app from ollama.com or via
winget install Ollama.Ollama - Run it once, then in a terminal:
setx OLLAMA_HOST 0.0.0.0:7860and restart Ollama (or use WSL with the Linux command above).
- Install the Ollama app from ollama.com or via
- Docker (optional):
docker run -d --name ollama -p 7860:11434 -v ollama:/root/.ollama ollama/ollama- The API is then available at
http://localhost:7860(mapped to container’s 11434).
Verify Ollama
curl http://localhost:7860/api/tagsshould return JSON (empty if no models yet).
Pull required models
- Generation (Ollama):
ollama pull gpt-oss:20b - Embeddings (default
bge-m3:latest):ollama pull bge-m3:latest
Install
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Configure (optional)
- Copy
.env.exampleto.envand adjust values. - Defaults:
OLLAMA_BASE_URL=http://localhost:7860GENERATION_MODEL=gpt-oss:20bEMBEDDING_MODEL=bge-m3:latestCHROMA_DIR=./data/chromaCHAT_BACKEND=ollama(set togeminiorcerebrasto switch cloud providers)- For Gemini: set
GEMINI_API_KEYand optionallyGEMINI_MODEL(e.g.,gemini-1.5-flash) - For Cerebras: set
CEREBRAS_API_KEYand optionallyCEREBRAS_MODEL(e.g.,llama-4-scout-17b-16e-instruct)
Commands
- Ingest from local CSV (recommended for dmom_data.csv)
python -m tonrag.cli ingest \
--csv data/dmom_data.csv \
--text-field Reference \
--id-field no \
--chunk-size 800 \
--chunk-overlap 120
- Ingest from Hugging Face dataset
python -m tonrag.cli ingest \
--dataset tungedng2710/Dmom_dataset \
--split train \
--text-field context \
--id-field id \
--chunk-size 800 \
--chunk-overlap 120
Notes:
- If you are unsure of field names, run:
- CSV:
python -m tonrag.cli inspect --csv data/dmom_data.csv - HF:
python -m tonrag.cli inspect --dataset tungedng2710/Dmom_dataset --split train
- CSV:
- The script tries a few sensible defaults and will suggest possible column names.
- Answer a single question
python -m tonrag.cli query \
--question "<your question>" \
--top-k 5
Use Gemini or Cerebras instead of Ollama for this run:
python -m tonrag.cli query --question "<your question>" --llm gemini
python -m tonrag.cli query --question "<your question>" --llm cerebras
- Evaluate on the dataset (quick lexical match)
# CSV example
python -m tonrag.cli eval \
--csv data/dmom_data.csv \
--question-field instruction \
--answer-field output \
--top-k 5 \
--limit 50
# HF example
python -m tonrag.cli eval \
--dataset tungedng2710/Dmom_dataset \
--split validation \
--question-field question \
--answer-field answer \
--top-k 5 \
--limit 50
Project Structure
tonrag/config.py– environment/config defaultstonrag/embeddings.py– Ollama embeddings client; ST fallback if availabletonrag/llm.py– Ollama chat client (non-streaming)tonrag/vectorstore.py– Chroma wrappertonrag/dataset.py– dataset utilities and column auto-detectiontonrag/chunking.py– simple text chunkertonrag/rag.py– retrieval + prompt assembly + generationtonrag/cli.py– CLI entry points (ingest/query/eval/inspect)app/server.py– minimal web app (stdlib) servingapp/static/and/api/chatapp/static/– frontend assets (index.html, style.css, app.js)
Troubleshooting
- If embeddings fail, ensure the embedding model is pulled and available in Ollama:
ollama pull bge-m3:latest. - If generation fails, ensure
gpt-oss:20bis available:ollama pull gpt-oss:20b. - If dataset download fails, ensure your environment has internet access and the
datasetspackage can reach Hugging Face.
Web App
- FastAPI server (recommended):
uvicorn app.main:app --host 0.0.0.0 --port 7865 - Open:
http://localhost:7865 - Endpoints:
GET /– serves UIPOST /api/chat– body:{ "message": "...", "top_k": 5, "llm": "ollama|gemini|cerebras", "llm_api_key": "optional" }GET /health
- The app uses the same RAG pipeline and Chroma store.
Legacy stdlib server (optional): python app/server.py --port 7865
Chat UI
- Clean bubble layout with avatars, typing indicator, and copy buttons.
- Collapsible “Contexts” under each assistant message to inspect retrieved sources.
- Shift+Enter for newline, Enter to send, clear chat, adjustable Top‑K.
- Answers render Markdown (headings, lists, links, code blocks) safely in the UI.
- CLI: add
--strip-markdownto print plain‑text answers. - LLM selector in the header to switch between Ollama, Gemini, or Cerebras per request.
License
- For this template, no explicit license is added; adapt as needed.
