Skip to content

salman-manaa/docuproof

Repository files navigation

DocuProof

CI Python

DocuProof is a local-first Document AI + RAG API for selectable-text PDFs.

In 60 seconds: upload a PDF, ask evidence-grounded questions, extract typed invoice/receipt fields, and get backend-generated citations. If evidence is weak, responses deterministically refuse with "Not found in the document.".

Demo

DocuProof demo

2-minute quickstart

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
export PYTHONPATH=src
uvicorn docuproof.api:app --reload

In a second terminal:

source .venv/bin/activate
export PYTHONPATH=src
python -m docuproof.cli upload data/samples/sample_invoice.pdf
# copy doc_id from output
python -m docuproof.cli ask --doc-id <doc_id> --question "What is the total amount?"
python -m docuproof.cli verify --doc-id <doc_id> --question "What is the total amount?"
bash scripts/demo.sh

Optional (for embedding download rate limits):

export HF_TOKEN=your_token_here

Proof

Expected response shapes for upload are versioned under:

  • examples/expected_upload.json
  • examples/expected_upload_presence.json

The smoke path is validated by:

  • tests/test_smoke.py
  • scripts/demo.sh

Security

  • Local-first default (remote LLM disabled unless LLM_PROVIDER is explicitly set).
  • No sensitive logging: no PDF bytes, extracted text, chunks, prompts, or answers.
  • Citations are backend-generated from metadata, not model-generated text.

Limitations

  • No OCR/scanned PDFs.
  • No multi-document QA.
  • No line-item extraction.
  • No layout/bbox reasoning.

CI

GitHub Actions runs lint + tests on push and PR:

  • python -m ruff check .
  • python -m pytest -q

About

Local-first Document AI for evidence-grounded Q&A and invoice extraction.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages