Consumer Duty Evidence Engine is a Django/React portfolio project that simulates a high-accountability evidence-review workflow inspired by FCA Consumer Duty monitoring expectations. It ingests complaints, disclosures, support transcripts, scripts, and policy materials; extracts structured claims and outcome-relevant facts; maps them to Consumer Duty outcome areas; scores evidence sufficiency; flags unsupported or contradictory evidence; routes uncertain cases into analyst review; and produces audit-ready outputs with traceable citations, evaluation metrics, and observable workflow state.
Most portfolio AI applications stop at retrieval or summarisation.
This project is designed to demonstrate how AI operates inside a controlled, auditable workflow where outputs must be:
- structured
- traceable
- reviewable
- auditable
- measurable
- safe under failure
The focus is not generating answers, but managing evidence under uncertainty.
-
AI-assisted extraction with strict schema validation
-
rule-assisted outcome mapping to a constrained taxonomy
-
evidence sufficiency scoring:
- supported
- weak support
- missing support
- contradictory support
- stale support
-
contradiction detection across multi-document case bundles
-
human review queues with assignment, approval, escalation, override controls, and audit logging
-
explicit state machine enforcing workflow correctness
-
observable async pipelines with WebSocket updates
-
regression-tested evaluation harness with 40+ benchmark cases
-
conservative fallback behaviour under provider failure or insufficient evidence
System type: Django monolith with async workers and React frontend
Backend
- Django + Django REST Framework (API and orchestration)
- PostgreSQL (source of truth)
- Redis (cache, broker, Channels layer)
- Celery (async task pipeline)
- Django Channels (WebSockets)
Frontend
- React + TypeScript + Vite
- React Query (server state)
- Zod (schema validation)
- Zustand (client state)
Supporting systems
- pgvector (limited retrieval support)
- evaluation harness (synthetic datasets + regression runner)
- audit/event system
- observability and metrics layer
- Upload complaint and related artefacts
- Persist artifacts and enqueue ingestion
- Parse and segment documents asynchronously
- Extract structured claims using strict schema validation
- Map claims to Consumer Duty outcome areas
- Link supporting and contradicting evidence
- Assess evidence sufficiency
- Detect contradictions and stale evidence
- Generate structured recommendation memo (when safe)
- Route uncertain cases into human review
- Persist all actions in an audit timeline
The system enforces a strict state machine:
new → ingestion_pending → parsing → parsed → extraction → mapping → assessment → recommendation
Terminal paths:
approvedneeds_reviewescalatedfailedarchived
Invalid transitions are explicitly rejected.
Review workflow states:
unassigned → assigned → in_review → approved / overridden / escalated → closed
Backend
- Django
- Django REST Framework
- PostgreSQL
- Redis
- Celery
- Django Channels
- pgvector
- drf-spectacular (OpenAPI)
Frontend
- React
- TypeScript
- Vite
- React Router
- React Query
- Zod
- Zustand
Tooling
- Pytest
- Vitest
- Ruff, Black, isort
- Docker Compose
- GitHub Actions CI
The commands below reflect the Windows development environment used for this project.
- Python 3.14
- Node.js 20+
- pnpm
- Docker Desktop
cd D:\AI-Projects\consumer-duty-evidence-engine
.\.venv\Scripts\Activate.ps1
python -m pip install -r backend\requirements\dev.txtdocker compose up -d db rediscd backend
python manage.py migrate
python manage.py runservercd backend
celery -A config worker -l infocd frontend
pnpm install
pnpm devFrontend: http://localhost:5173 Backend: http://localhost:8000
The project includes 12 seeded demo cases, covering:
- unclear fee disclosure
- contradictory support scripts
- missing evidence scenarios
- stale policy/script cases
- schema failure simulation
- provider failure simulation and safe fallback routing
- clearly supported cases
Seed data:
python infra/scripts/seed_demo_data.pyThe system includes a synthetic evaluation harness with 40+ cases across:
- supported scenarios
- weak support
- contradictions
- missing evidence
- stale evidence
- adversarial formatting
- routing edge cases
- citation validation cases
Eval runner:
python infra/scripts/run_eval_suite.pyMetrics tracked:
- claim precision / recall
- outcome mapping accuracy
- support-status accuracy
- routing accuracy
- citation validity rate
- degraded-mode success rate
Reports:
evals/reports/latest-report.json
The system explicitly models uncertainty and failure:
- correlation IDs for request tracing
- structured JSON logging
- audit events for all state transitions
- model execution logs (latency, status, cost)
- WebSocket status updates
Failure handling:
- schema validation failures → forced review
- provider failures → conservative fallback or abstention
- contradictory evidence → review routing
- missing evidence → review routing
- stale evidence → review routing
Fallback modes
- rules-only mode when model unavailable
- source-only mode when generation unsafe
- request-more-evidence routing when the case lacks sufficient support for a safe recommendation
OpenAPI schema:
/api/schema/
Swagger UI:
/api/docs/
Key endpoints:
/api/cases//api/cases/{id}/claims//api/cases/{id}/assessments//api/cases/{id}/recommendation//api/review-tasks//api/metrics/overview//api/evals/runs/
See:
docs/architecture/
docs/adr/
docs/domain/
docs/demos/
Key documents:
- system overview
- state machine
- ingestion pipeline
- evaluation design
- failure modes
- observability
- Portfolio-grade simulation, not production compliance software
- Uses synthetic datasets rather than real regulated data
- Simplified Consumer Duty taxonomy
- Limited retrieval (pgvector used minimally)
- Mock or constrained model integration
- Built an AI-assisted evidence-review workflow with async ingestion, structured extraction, and human review routing
- Implemented evidence sufficiency scoring and contradiction detection across multi-artifact case bundles
- Designed an evaluation harness with regression datasets and measurable metrics
- Added conservative fallback and failure-aware routing for provider failure and low-support cases
- Exposed full audit trail, state transitions, and review actions through API and UI
This project is licensed under the MIT License.
Copyright (c) 2026 Cherry Augusta
See the LICENSE file for full details.










