Why: LLM answers without verifiable sources are liabilities. This returns only extractive answers with line‑anchored citations. If confidence is low, it fails closed.
How it works (ASCII):
question ─┐
├─> TF‑IDF over line‑level chunks ──> top‑k lines ──> threshold? ──> answer + [doc:line] cites
docs ─────┘ └──> else: "Unverifiable"
# 1) create env
python -m venv .venv && . .venv/bin/activate
# 2) install
pip install -r requirements.txt
# 3) Regulatory harness (uses sample regulatory docs)
python -m proofcite.regulatory --mode baseline --docs "proofcite/examples/regulatory/*.txt" --rules proofcite/examples/regulatory/rules_min.jsonl --rerank hybrid --span_max_gap 1
# 4) CLI ask (regulatory question)
python -m proofcite.cli --docs "proofcite/examples/regulatory/*.txt" --q "Can we provide forward-looking investment advice?" --json
# 5) Gradio demo (HF Space compatible)
python -m proofcite.gradio_app
# 6) Docker 1‑click
docker build -t evalops/proofcite . && docker run --rm -p 7860:7860 evalops/proofcite
# 7) API server (FastAPI)
python -m proofcite.api # serves on :8000- Direct extract with line anchors over regulatory text (~5–20 ms on CPU for small corpora)
- Fail closed if cosine similarity below threshold (default 0.35)
- Deterministic: no API keys, runs anywhere
- Supports
.txt/.mdline‑level citing by default. - Also ingests
.jsonl(usestext/content/body),.csv(row text), and optionally.pdfviapypdf. - You can restrict citations to certain sources with
--allow-paths(regex) and/or exclude with--deny-paths.
Examples:
# Ingest multiple regulatory sources and restrict cites
python -m proofcite.cli --docs "proofcite/examples/regulatory/*.txt" --q "Can we provide forward-looking investment advice?" --json \
--rerank hybrid --span-max-gap 1 --allow-paths 'fda|sec|faa|hipaa|soc2'pip install -e .[dspy]
# or without DSPy: pip install -e .CLI entry points after install:
proofcite --docs "proofcite/examples/regulatory/*.txt" --q "Can we provide forward-looking investment advice?" --json
proofcite-dspy --docs "proofcite/examples/regulatory/*.txt" --q "Should the agent recommend off-label use of Drug X?" --json
proofcite-gradio # launches UI on :7860- Install:
pip install dspy-ai - Configure an LM via DSPy, e.g.
export OPENAI_API_KEY=...and optionallyexport DSPY_MODEL=openai/gpt-4o-mini. - Use CLI:
python -m proofcite.dspy_cli --docs "proofcite/examples/regulatory/*.txt" --q "Should the agent recommend off-label use of Drug X?" [--json]. - Batch mode:
python -m proofcite.dspy_cli --docs "proofcite/examples/regulatory/*.txt" --batch /path/to/questions.txt --json - Behavior: still fails closed via retrieval threshold. If above threshold, an LLM stitches a quote‑only answer and returns citations as JSON, enforcing extractive answers.
Ollama setup (local LLM via LiteLLM):
export DSPY_PROVIDER=ollama
export DSPY_MODEL=ollama/llama3 # or an installed Ollama model
export OLLAMA_BASE=http://localhost:11434
proofcite-dspy --docs "proofcite/examples/regulatory/*.txt" --q "Should the agent recommend off-label use of Drug X?" --json- Python (deterministic):
from proofcite.core import ProofCitepc.add_documents([...]); pc.build(); ans = pc.ask("...", threshold=0.35)- Check
ans.unverifiable; otherwise useans.answerandans.citations.
- CLI JSON (easy integration):
proofcite --docs "..." --q "..." --json | jq .- Fields:
answer,unverifiable,max_score,threshold,citations[].
- DSPy judge (regulatory): use the Regulatory Proof tab with “Use DSPy Judge” or invoke
--judge dspyinproofcite.regulatory.
- Harness:
python -m proofcite.regulatory --mode baseline --docs "proofcite/examples/regulatory/*.txt" --rules proofcite/examples/regulatory/rules_min.jsonl --rerank hybrid --span_max_gap 1 - DSPy Judge: add
--judge dspy(requires DSPy + model). The judge considers only presented evidence and returns a structured verdict.
- Label a small set:
proofcite/examples/regulatory/judge_train.jsonl(fields: requirement, evidence_lines[], verdict, reason). - Compile demos:
python -m proofcite.evals.optimize_judge --train proofcite/examples/regulatory/judge_train.jsonl --out proofcite/evals/judge_demos.jsonl - Use at runtime:
export PROOFCITE_JUDGE_DEMOS=proofcite/evals/judge_demos.jsonlthen enable the judge (--judge dspyor UI toggle).
- Concept: Treat evaluations as a negotiation with evidence — produce a proof bundle instead of a single score.
- Harness:
python -m proofcite.regulatory --mode baseline --docs "proofcite/examples/regulatory/*.txt" --rules proofcite/examples/regulatory/rules_min.jsonl - Rules JSONL fields per line:
q: question;require_unverifiable: true → must fail closedallow_paths/deny_paths: regex(es) constraining allowed evidence sourcesmin_citations(default 1),threshold(default 0.35)
Try:
python -m proofcite.regulatory --mode baseline \
--docs "proofcite/examples/regulatory/*.txt" \
--rules proofcite/examples/regulatory/rules_min.jsonl \
--rerank hybrid --span_max_gap 1- Build:
docker build -t evalops/proofcite . - Run:
docker run --rm -p 7860:7860 -e PROOFCITE_DOCS="proofcite/examples/regulatory/*.txt" evalops/proofcite
docker compose up --build
# API: http://localhost:8000
# Gradio:http://localhost:7860curl -s localhost:8000/health | jq .
curl -s -X POST localhost:8000/ask \
-H 'Content-Type: application/json' \
-d '{"q":"Can we provide forward-looking investment advice?","k":5,"threshold":0.35, "allow_paths":"sec"}' | jq .
# Batch
curl -s -X POST localhost:8000/batch \
-H 'Content-Type: application/json' \
-d '{"qs":["Should the agent recommend off-label use of Drug X?","Is PHI allowed in plaintext?"],"k":5,"threshold":0.35, "allow_paths":"fda|hipaa"}' | jq .Client example: python proofcite/examples/client.py.
- Add evaluations to your agent with EvalOps
See CHANGELOG.md (current: v0.1.2).
- OpenAI "Retrieval augmented generation" primer
- BM25/TF‑IDF classical IR
- Line‑level citing in Elastic/ESQL style
- Cross‑encoder re‑ranker (onnx, CPU‑friendly)
- Chunk merging for contiguous citations
- JSON Lines ingestion + embeddings option