DocuProof is a local-first Document AI + RAG API for selectable-text PDFs.
In 60 seconds: upload a PDF, ask evidence-grounded questions, extract typed invoice/receipt fields, and get backend-generated citations. If evidence is weak, responses deterministically refuse with "Not found in the document.".
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
export PYTHONPATH=src
uvicorn docuproof.api:app --reloadIn a second terminal:
source .venv/bin/activate
export PYTHONPATH=src
python -m docuproof.cli upload data/samples/sample_invoice.pdf
# copy doc_id from output
python -m docuproof.cli ask --doc-id <doc_id> --question "What is the total amount?"
python -m docuproof.cli verify --doc-id <doc_id> --question "What is the total amount?"
bash scripts/demo.shOptional (for embedding download rate limits):
export HF_TOKEN=your_token_hereExpected response shapes for upload are versioned under:
examples/expected_upload.jsonexamples/expected_upload_presence.json
The smoke path is validated by:
tests/test_smoke.pyscripts/demo.sh
- Local-first default (remote LLM disabled unless
LLM_PROVIDERis explicitly set). - No sensitive logging: no PDF bytes, extracted text, chunks, prompts, or answers.
- Citations are backend-generated from metadata, not model-generated text.
- No OCR/scanned PDFs.
- No multi-document QA.
- No line-item extraction.
- No layout/bbox reasoning.
GitHub Actions runs lint + tests on push and PR:
python -m ruff check .python -m pytest -q
