Private Doc Chat — 100% Browser-Only RAG

A lightweight, privacy-first PDF Q&A app. Drop a document, ask questions, and get grounded answers with inline citations — all on-device, with no servers and no API keys.

Key idea: Retrieval-Augmented Generation (RAG) implemented fully in the browser using WebGPU and WebAssembly. The model and embeddings run locally; your files never leave your machine.

Highlights

Zero-cost / zero backend: No OpenAI/Groq/Google keys. No network calls for inference.
On-device LLM: WebLLM (MLC) runs a tiny instruction model in your browser (WebGPU).
Local embeddings: Sentence embeddings via Transformers.js (MiniLM) inside the browser.
Grounded answers: Citations like [C7] link to exact text chunks and pages.
Privacy by design: Documents, vectors, and metadata persist only in IndexedDB (localForage).
Accessibility & UX: Keyboard-friendly controls, progress feedback, and strict quote-only mode.

Quick Run (Local)

Requirements: Desktop Chrome or Edge (WebGPU enabled).

Open a terminal in this folder and start a simple local server:
- Python 3:
```
python -m http.server 8080
```
- Windows (py launcher):
```
py -m http.server 8080
```
Visit http://localhost:8080.
Click “Load Tiny Model” (first load downloads weights via CDN).
Upload PDF → wait for “✅ Ingest complete.”
Ask a question.
- For guaranteed faithfulness, turn on Strict (quote-only) — the answer will be composed of sentences copied from the retrieved chunks and cited like [C3].

The app registers a Service Worker to make the page itself available offline after first load. (Large model files are handled by the browser cache.)

What You Can Do in the UI

Upload PDF: Parse → chunk → embed → persist locally.
Top-k: Tune retrieval breadth (3–8).
Max answer length: Cap the output tokens.
Strict (quote-only): Extractive answers with citations only; otherwise reply “I don’t know.”
Ask: Retrieval + prompt to the local LLM, with inline citations.
Clear Data: Wipes IndexedDB (vectors + metadata) for this origin.

How It Works (High Level)

Parse PDF with pdf.js to extract per-page text.
Chunk into ~900-char windows with ~150 overlap; track (page, charStart, charEnd).
Embed each chunk using @xenova/transformers (MiniLM) in the browser (ONNX/WASM/WebGPU).
Persist vectors + metadata in IndexedDB via localForage.
Query: Embed the user question, compute cosine similarity over stored vectors, take top-k chunks.
Prompt: Build a short instruction + quoted chunks; ask WebLLM for an answer.
- Strict mode sets temperature to 0 and forces copying only from chunks (with citations).
Citations: Render [C#] tags as anchors; clicking scrolls/highlights the exact chunk.

See docs/architecture.md for a diagram and data flow.

Tech Choices (Why These)

WebLLM (MLC) — runs instruction-tuned LLMs in the browser with WebGPU; best in class for on-device inference without native installs.
Transformers.js (MiniLM) — small, fast sentence embeddings portable to the web via ONNX/WASM.
pdf.js — robust text extraction from text-based PDFs.
localForage (IndexedDB) — simple, reliable client-side persistence for a tiny vector store.

Reliability & Guardrails

Similarity threshold — if the best match is weak, the app answers “I don’t know based on this document.”
Strict mode — extractive answers only; prevents hallucinations by construction.
Progress indicators — model load, parsing, and embedding show status in the UI.
Fail-safe UI — a stuck overlay auto-clears; Esc hides it instantly.

Performance Notes

Built for typical laptops in modern Chrome/Edge.
Embedding time scales with document size; 10–30 page PDFs provide a smooth demo.
Answer length and Top-k affect speed; try Top-k = 6 and Max tokens = 128–256 for best balance.

Accessibility

Keyboard-navigable controls with visible focus.
ARIA labels on primary actions.
Citation links are real anchors that move focus to sources.

Privacy

No uploads. All processing happens locally.
Use Clear Data to wipe everything stored by the app.

Limitations

Scanned PDFs (images) without OCR won’t extract clean text.
Tiny local models can still drift if Strict is off and the question is vague.
Browser storage is finite; very large documents may exceed quota.

Folder Guide

index.html — UI shell and library imports
styles.css — compact, accessible styling
app.js — PDF ingest, embeddings, retrieval, prompting, citations, UI state
sw.js — app-shell caching for offline UX
docs/architecture.md — flow diagram & key details
docs/verification-checklist.md — manual test steps for demo

License

MIT — use freely, no warranty.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.js		app.js
index.html		index.html
private-doc-chat_demo_compressed.mp4		private-doc-chat_demo_compressed.mp4
styles.css		styles.css
sw.js		sw.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Private Doc Chat — 100% Browser-Only RAG

Highlights

Quick Run (Local)

What You Can Do in the UI

How It Works (High Level)

Tech Choices (Why These)

Reliability & Guardrails

Performance Notes

Accessibility

Privacy

Limitations

Folder Guide

License

About

Uh oh!

Releases

Packages

Languages

License

pranjalmax/private-doc-chat

Folders and files

Latest commit

History

Repository files navigation

Private Doc Chat — 100% Browser-Only RAG

Highlights

Quick Run (Local)

What You Can Do in the UI

How It Works (High Level)

Tech Choices (Why These)

Reliability & Guardrails

Performance Notes

Accessibility

Privacy

Limitations

Folder Guide

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages