An agentic document workflow application for contract and policy compliance tracking, built with LlamaIndex, FastAPI, and Streamlit.
The Contract Compliance Tracker helps organisations stay on top of their contractual obligations. It ingests contracts, amendments, and internal policies, extracts structured obligations and risks, monitors renewal deadlines, and provides a natural-language Q&A interface grounded in source documents.
This is a portfolio project demonstrating intermediate and advanced AI engineering techniques including production RAG, agentic document workflows, structured extraction, and observability.
- Ingest PDF, DOCX, and TXT contracts into a LlamaIndex vector index
- Hybrid retrieval with metadata filtering (counterparty, doc type, date)
- Cited Q&A over the full contract corpus
- Agentic workflow that parses contracts clause-by-clause
- Extracts structured obligations (payment terms, SLAs, notice periods, confidentiality, termination rights)
- Persists to a relational database with fields: type, description, due date, frequency, risk level
- Cross-references extracted obligations against internal policies
- Flags non-compliant or risky clauses with severity levels and suggested remediation
- Generates a per-contract compliance report
- Detects renewal and termination dates during extraction
- Tracks upcoming renewals at 90 / 60 / 30 day thresholds
- Exposes an alerting endpoint
- Natural-language questions with citations back to specific clauses and sections
- Scoped to a single contract or the full corpus
graph LR
subgraph ui["π¨ UI Layer"]
streamlit["Streamlit Dashboard<br/>Chat Interface"]
end
subgraph api["βοΈ API Layer"]
fastapi["FastAPI Server<br/>- /chat<br/>- /contracts<br/>- /obligations<br/>- /issues<br/>- /renewals<br/>- /compliance-checks<br/>- /renewal-alerts"]
end
subgraph rag["π§ RAG & Workflows Layer"]
llamaindex["LlamaIndex<br/>- Vector Index<br/>- Hybrid Retrieval<br/>- Document Agents<br/>- Extraction Workflow<br/>- Compliance Workflow<br/>- Renewal Monitor"]
end
subgraph db["πΎ Data Layer"]
database["SQLite / PostgreSQL<br/>- Obligations<br/>- Issues<br/>- Renewals<br/>- Document Metadata"]
end
streamlit <-->|HTTP/Websocket| fastapi
fastapi <-->|Query/Retrieve| llamaindex
llamaindex <-->|Persist/Fetch| database
| Layer | Technology |
|---|---|
| LLM | OpenAI GPT-4.1-mini / Locally hosted |
| Embeddings | OpenAI text-embedding-3-small (or local HF) |
| RAG Framework | LlamaIndex (indices, query engines, workflows) |
| API | FastAPI + Uvicorn |
| UI | Streamlit |
| Database | SQLite (dev) / PostgreSQL (prod) |
| Config | Pydantic Settings + .env |
| Python | 3.12+ |
contract-compliance-tracker/
βββ data/
β βββ contracts/ # Sample contract and policy files
β βββ policies/ # Internal policy documents
β βββ eval/ # Evaluation Q&A pairs and results
βββ src/
β βββ config.py # Pydantic settings, env loading
β βββ llm.py # LLM client (OpenAI or local)
β βββ logging_config.py # Structured logging setup (structlog)
β βββ rag.py # LlamaIndex index build, query engine
β βββ models/ # SQLAlchemy ORM models (obligations, issues, renewals)
β βββ workflows/ # LlamaIndex agentic workflows
β β βββ extraction.py # Obligation extraction workflow
β β βββ compliance.py # Compliance checking workflow
β β βββ renewal_monitor.py # Renewal detection and alerting
β βββ api/ # FastAPI routes
β β βββ main.py # API endpoints
β β βββ middleware.py # Correlation ID tracing middleware
β β βββ schemas.py # Pydantic request/response schemas
β βββ eval/ # Evaluation harness
β β βββ evaluate.py # RAG quality metrics (relevance, faithfulness)
β βββ ui/ # Streamlit dashboard + chat
β β βββ app.py # Dashboard UI
β βββ scripts/
β βββ query.py # CLI query tool
βββ storage/
β βββ index/ # Persisted LlamaIndex vector index
βββ .env.example
βββ pyproject.toml
βββ README.md
- Python 3.12+
- One of the following LLM + embedding providers:
- OpenAI (default): OpenAI API key with GPT-4o-mini access
- Ollama: Local Ollama instance (http://localhost:11434) with a model like Llama 3
- llama.cpp: Local or network llama.cpp server (any HTTP endpoint with OpenAI-compatible API)
git clone https://github.com/benwalkerai/Portfolio_ComplianceCheck.git
cd Portfolio_ComplianceCheck
uv syncCopy the example env file and add your API key:
cp .env.example .env
# Edit .env and set OPENAI_API_KEY=sk-...Drop .txt, .pdf, or .docx files into data/contracts/.
uv run --script scripts/query.py "What are the payment terms in the Acme contract?"uv run main.pyThis starts the complete application (API and UI). On first run, the vector index is built automatically and persisted to storage/index/.
- Core RAG over contracts with citations
- FastAPI chat and contract endpoints
- Obligation extraction workflow
- Compliance checking workflow
- Renewal monitoring and alerting
- Streamlit dashboard and chat UI
- Offline evaluation harness (labelled Q&A pairs + retrieval metrics)
- Observability and tracing (structured logs, workflow run metadata)
All application logs are structured using structlog with automatic context binding. This enables:
- Correlation IDs: Every API request gets a unique
X-Correlation-IDheader (auto-generated or passed in) and automatically included in all logs for request tracing - Format Toggle: Set
LOG_FORMAT=jsonin.envfor JSON output (production), orLOG_FORMAT=consolefor human-readable console output (development) - Contextvars: Bound context (correlation ID, user ID, etc.) flows automatically through async code paths
Example log (JSON mode):
{"event": "extraction_started", "correlation_id": "550e8400-e29b", "contract_id": "acme-001", "timestamp": "2026-02-28T13:09:32.624843"}The CorrelationIDMiddleware in src/api/middleware.py attaches correlation IDs to every FastAPI request, enabling:
- End-to-end tracing across API, RAG workflows, and database queries
- Easy debugging and log aggregation in production environments
- Response headers include the correlation ID for client-side tracing
The project includes a comprehensive offline evaluation harness measuring RAG quality across 13 curated Q&A pairs from the contract corpus.
| Metric | Score | Notes |
|---|---|---|
| Relevance | 5.0 / 5.0 | Answers directly address questions with high contextual fit |
| Faithfulness | 4.69 / 5.0 | Zero hallucinations; all claims grounded in source documents |
| Correctness | 4.38 / 5.0 | Accurate information extraction with minor wording variations |
| Source Match Rate | 92% | Cited sources include the authoritative document for the answer |
| Avg Latency | 1.51s | Sub-2s response times for full-corpus queries |
- "What are the payment terms in the CloudHost MSA?" β Cited
cloudhost_msa.txtwith exact Net-30 terms and late-fee penalties - "How long must confidential information be protected under the standard NDA?" β Correctly distinguished 5-year obligation from indefinite trade secret confidentiality
- "What encryption standard is required for data at rest?" β Precise AES-256 identification from security policy
Evaluation Process: LlamaIndex retrieval with hybrid search β GPT-4o-mini answer generation β Llama 3 scoring against rubrics (relevance, faithfulness, correctness) β source match verification.
- Production RAG: advanced chunking, hybrid retrieval, re-ranking, metadata filtering, persisted indexes
- Agentic Document Workflows: LlamaIndex Workflows for multi-step extraction and compliance checking
- Structured Output Extraction: LLM-powered clause parsing into typed schemas
- Evaluation: offline eval sets with retrieval quality metrics and source grounding
- Observability: structured logging, correlation IDs, request tracing, JSON/console format toggle
- Clean Architecture: FastAPI service layer, Pydantic config, ORM models, separated concerns
MIT