Epistemic Audit Engine

LLM Output Verification Middleware for Claim-Level Reliability Auditing in Long-Form Generated Text.

Production-style runtime service for auditing GenAI outputs and assigning epistemic risk signals prior to downstream system usage.

Runtime Service Overview

graph TD
    User[Client / Downstream App] -->|POST /audit| API[FastAPI Backend]
    API -->|Async Pipeline| Engine[Audit Engine]
    
    subgraph "Verification Pipeline"
        Engine --> Extract[Claim Extraction]
        Extract --> Link[Entity Linking]
        Link --> Retrieve[Evidence Retrieval]
        Retrieve --> Verify[Claim Verification]
        Verify --> Agg[Risk Aggregation]
    end
    
    subgraph "Data & Analytics"
        Engine -.->|Log| JSONL[audit_runs.jsonl]
        JSONL -.->|Offline Eval| Dashboard[Research Dashboard]
    end
    
    Engine -->|Structured Risk Signal| User

Audit Runtime Interface

Interactive Auditing

The frontend provides a research-grade interface for manual inspection of model outputs:

Input: Paste generated text (up to 20k chars).
Process: Visualize the claim extraction and verification process in real-time.
Output: Granular, claim-level verdicts (Supported, Refuted, Uncertain) with linked evidence.

Ideal for: Red-teaming, Policy tuning, and Qualitative analysis of model failure modes.

Backend API Surface

Runtime Spec

Runtime: FastAPI (Python 3.11+)
Entrypoint: POST /audit
Observability: GET /health for readiness probes.
Concurrency: Async-first pipeline design for high-throughput auditing.

# Example Health Check
curl http://localhost:8000/health

Structured Output Schema

Integration Signal

The engine produces a strictly typed JSON response designed for programmatic consumption:

overall_risk: High-level traffic light signal (LOW, MEDIUM, HIGH) for gating.
hallucination_score: Normalized [0-1] score for threshold-based filtering.
claims: Array of atomic claims with individual verdicts and evidence context.

Use this payload to:

Block high-risk responses.
Flag uncertain claims for human review.
Inject citations back into the generation.

Runtime Logging + Eval Dataset Construction

Continuous Evaluation

Every request to the inference endpoints is automatically logged to an append-only JSONL event stream:

Traceability: Full input/output capture with timestamp and configuration metadata.
Dataset Generation: Logs can be directly consumed by the evaluation harness to build fine-tuning datasets or regression benchmarks.
Reproducibility: Contains all necessary state to replay audits ensuring deterministic behavior.

File path: paper/data/audit_runs.jsonl

Evaluation Harness

Automated Benchmarking

Reproducible research capabilities are built-in as first-class citizens:

Seed Control: Deterministic execution for reliable regression testing.
Prompt Perturbation: Evaluate massive batches of synthetic outputs.
Artifact Generation: Automatically produces PDF/PNG analysis figures for calibration reports.

Run the harness:

EPI_SYNTH_MODE=demo EPI_SYNTH_RUNS=500 bash scripts/run_research.sh

Health Monitoring

Production Readiness

Liveness: Simple HTTP 200 OK.
Readiness: Checks pipeline initialization and model loading status.
Uptime: Tracks service stability.

Deployment Runtime

Stack

Backend: Python / FastAPI / Uvicorn
Frontend: Next.js (React) / Tailwind
Orchestration: Dockerizable services, ready for Kubernetes or ECS.

Service Entrypoints

Backend: uvicorn app:app --host 0.0.0.0 --port 8000
Frontend: npm start (port 3000)

Local Development

Backend: cd backend && PYTHONPATH=$PWD .venv/bin/python -m uvicorn app:app --host 127.0.0.1 --port 8000
Frontend: cd frontend && npm run dev
Run both: npm run dev from the repo root
Frontend env: copy frontend/.env.example to frontend/.env.local and set BACKEND_URL=http://127.0.0.1:8000
Expected ports: frontend on http://127.0.0.1:3000, backend on http://127.0.0.1:8000

The frontend health proxy calls the backend GET /health endpoint and fails fast with a clear 503 JSON payload when the backend is unavailable. The root backend launcher prefers ./.venv/bin/python, then backend/.venv/bin/python, then falls back to python3 or python if one of those interpreters already has the backend requirements installed.

Integration Use Case

Enterprise GenAI Guardrails

Integrate Epistemic Audit Engine as a middleware layer in your RAG or Copilot architecture:

Internal Copilots: Prevent hallucinated policy advice in HR/Legal bots.
Document QA: Verify answers against retrieval context before showing to users.
Compliance Pipelines: Audit generated marketing copy for factual claim reliability.
Moderation: Automate the detection of unsubstantiated claims in user-generated content.

Flow: LLM Generation -> Epistemic Audit -> (Low Risk) -> User -> (High Risk) -> Fallback / Warning

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
backend		backend
benchmark		benchmark
demo		demo
docs		docs
evaluation		evaluation
figures		figures
frontend		frontend
golden		golden
paper		paper
scripts		scripts
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Procfile		Procfile
README.md		README.md
VERSION		VERSION
evaluation_pipeline.py		evaluation_pipeline.py
example_entity_output.json		example_entity_output.json
example_evidence_output.json		example_evidence_output.json
example_hallucination_output.json		example_hallucination_output.json
example_output.json		example_output.json
example_verification_output.json		example_verification_output.json
fig11_latency_vs_input_length.png		fig11_latency_vs_input_length.png
fig9_roc_curve.pdf		fig9_roc_curve.pdf
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt
run_phase2.py		run_phase2.py
run_phase3.py		run_phase3.py
run_phase4.py		run_phase4.py
run_phase5.py		run_phase5.py
test_claim_extractor.py		test_claim_extractor.py
test_entity_linker.py		test_entity_linker.py
test_final_sanity.py		test_final_sanity.py
test_google_fix.py		test_google_fix.py
test_hallucination.py		test_hallucination.py
test_numeric_bounds.py		test_numeric_bounds.py
test_phase3_evidence.py		test_phase3_evidence.py
test_phase4_verification.py		test_phase4_verification.py
test_phase5_hallucinations.py		test_phase5_hallucinations.py
test_sec_integration.py		test_sec_integration.py
test_strict_backend.py		test_strict_backend.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Epistemic Audit Engine

Runtime Service Overview

Audit Runtime Interface

Interactive Auditing

Backend API Surface

Runtime Spec

Structured Output Schema

Integration Signal

Runtime Logging + Eval Dataset Construction

Continuous Evaluation

Evaluation Harness

Automated Benchmarking

Health Monitoring

Production Readiness

Deployment Runtime

Stack

Service Entrypoints

Local Development

Integration Use Case

Enterprise GenAI Guardrails

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Epistemic Audit Engine

Runtime Service Overview

Audit Runtime Interface

Interactive Auditing

Backend API Surface

Runtime Spec

Structured Output Schema

Integration Signal

Runtime Logging + Eval Dataset Construction

Continuous Evaluation

Evaluation Harness

Automated Benchmarking

Health Monitoring

Production Readiness

Deployment Runtime

Stack

Service Entrypoints

Local Development

Integration Use Case

Enterprise GenAI Guardrails

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages