SQL Schema Documentation for Contract Processing System

Overview

This document describes the comprehensive SQL schema designed for the contract processing system that handles 2600+ documents with version tracking, ontology management, and knowledge graph capabilities.

Key Features

File Hash-Based Version Control: Every file is hashed before processing to detect changes
Processing Status Tracking: Complete audit trail of document processing
Contract Ontology: Hierarchical categorization system for contracts
Relationship Management: Track relationships between documents and companies
Performance Statistics: Daily processing metrics and KPIs

Database Tables

1. `documents` (Core Table)

The main table storing all document information with version tracking.

Key Fields:

id: Primary key
document_id: UUID for unique document identification
current_file_hash: SHA-256 hash of the current file version
processing_status: pending, processing, completed, failed, reprocessing
processing_version: Incremented with each file change
contract_start_date, contract_end_date: Extracted contract dates
analysis_results: JSON storage for AI analysis results

Processing Workflow:

File hash calculated before processing
Check if document exists by hash
If file changed (different hash), create new version
Track processing status throughout lifecycle

2. `file_hashes`

Tracks all file hashes for version control.

Purpose:

Detect when files have been modified
Maintain history of all file versions
Enable skip-processing for unchanged files

Key Fields:

document_id: Links to documents table
file_hash: SHA-256 hash
is_current: Boolean flag for current version
calculated_at: Timestamp of hash calculation

3. `document_versions`

Complete version history for each document.

Tracks:

Version number (auto-incremented)
What changed (content_update, metadata_update, reprocessed)
Processing results for each version
Who made the change and when

4. `contract_ontology`

Hierarchical categorization system for contracts.

Default Structure:

ROOT (All Contracts)
├── PROCUREMENT
│   ├── PROCUREMENT.GOODS (Goods Procurement)
│   └── PROCUREMENT.SERVICES (Services Procurement)
├── LEGAL
│   ├── LEGAL.NDA (Confidentiality)
│   └── LEGAL.IP (Intellectual Property)
└── OPERATIONAL
    ├── OPERATIONAL.FACILITIES (Facilities)
    └── OPERATIONAL.IT (IT Services)

Features:

Hierarchical structure with parent-child relationships
Color coding for visualization
Keywords and rules for automatic classification
Active/inactive status for categories

5. `document_ontology_mapping`

Maps documents to ontology categories.

Key Features:

Confidence scores (0.00 to 1.00)
Primary category designation
Tracking of assignment method (AI, user, rule-based)

6. `companies`

Stores company information with hierarchical structure.

Features:

Parent-subsidiary relationships
Company groups for organization
Industry and country metadata

7. `document_companies`

Links documents to companies with roles.

Roles:

vendor
client
prime_contractor
subcontractor
witness

8. `document_relationships`

Tracks relationships between documents.

Relationship Types:

amendment
renewal
supersedes
references
related

9. `processing_logs`

Complete audit trail of all processing activities.

Tracks:

Every action (created, processing_started, completed, failed)
Duration and memory usage
Error messages and details
User/system that performed action

10. `processing_statistics`

Daily aggregated statistics for dashboards.

Metrics:

Total documents processed
Success/failure counts
Average processing time
Unique companies discovered
New relationships found

Processing Workflow

1. Pre-Processing Phase

# For each file:
file_hash = calculate_sha256(file_path)
existing = check_document_exists(file_hash)

if existing and not needs_reprocessing:
    skip_file()
else:
    add_to_processing_queue()

2. Document Creation/Update

if new_document:
    # Create new document record
    document = create_document(file_hash, metadata)
    version = 1
else:
    # File changed - create new version
    version = increment_version()
    update_document_hash(new_hash)

3. Processing Phase

# Update status
update_status('processing')

# Process content
result = analyze_document(content)

# Update with results
update_status('completed', results=result, duration=time)

# Auto-assign ontology
assign_to_ontology(document_id, detected_category)

4. Post-Processing

# Link to companies
link_to_company(document_id, company_id)

# Detect relationships
find_related_documents(content_analysis)

# Update statistics
update_daily_statistics()

Key SQL Queries

Check if Document Needs Processing

SELECT d.*, fh.file_hash
FROM documents d
JOIN file_hashes fh ON d.id = fh.document_id
WHERE fh.file_hash = :hash AND fh.is_current = true;

Get Document with Current Version

SELECT d.*, dv.version_number, dv.analysis_results
FROM documents d
JOIN document_versions dv ON d.id = dv.document_id
WHERE d.id = :doc_id
AND dv.version_number = d.processing_version;

Find Documents by Ontology

SELECT d.*, co.category_name
FROM documents d
JOIN document_ontology_mapping dom ON d.id = dom.document_id
JOIN contract_ontology co ON dom.ontology_id = co.id
WHERE co.category_code = :category_code
AND dom.is_primary = true;

Contract Expiration Report

SELECT *,
    CASE
        WHEN contract_end_date < CURRENT_DATE THEN 'expired'
        WHEN contract_end_date < CURRENT_DATE + INTERVAL '30 days' THEN 'expiring_soon'
        ELSE 'active'
    END as status
FROM documents
WHERE contract_end_date IS NOT NULL
ORDER BY contract_end_date;

Benefits of This Design

Efficient Processing: Skip already-processed files using hash comparison
Version Control: Track all changes to documents over time
Audit Trail: Complete history of who did what and when
Scalability: Optimized indexes for 2600+ documents
Flexibility: JSON fields for storing varied analysis results
Relationships: Track complex document and company relationships
Analytics: Built-in statistics for performance monitoring

Usage in Application

The application uses this schema to:

Prevent Duplicate Processing: Check hash before processing
Track Changes: Automatically version documents when files change
Organize Contracts: Use ontology for categorization
Build Knowledge Graphs: Visualize relationships
Monitor Performance: Track processing times and success rates
Ensure Data Integrity: Maintain audit trails

Best Practices

Always calculate file hash before processing
Use transactions for multi-table updates
Index frequently queried fields
Regular cleanup of old processing logs
Monitor processing statistics for performance issues
Use confidence scores for AI-assigned categories
Maintain referential integrity with foreign keys

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQL Schema Documentation for Contract Processing System

Overview

Key Features

Database Tables

1. `documents` (Core Table)

2. `file_hashes`

3. `document_versions`

4. `contract_ontology`

5. `document_ontology_mapping`

6. `companies`

7. `document_companies`

8. `document_relationships`

9. `processing_logs`

10. `processing_statistics`

Processing Workflow

1. Pre-Processing Phase

2. Document Creation/Update

3. Processing Phase

4. Post-Processing

Key SQL Queries

Check if Document Needs Processing

Get Document with Current Version

Find Documents by Ontology

Contract Expiration Report

Benefits of This Design

Usage in Application

Best Practices

FilesExpand file tree

SQL_SCHEMA_DOCUMENTATION.md

Latest commit

History

SQL_SCHEMA_DOCUMENTATION.md

File metadata and controls

SQL Schema Documentation for Contract Processing System

Overview

Key Features

Database Tables

1. documents (Core Table)

2. file_hashes

3. document_versions

4. contract_ontology

5. document_ontology_mapping

6. companies

7. document_companies

8. document_relationships

9. processing_logs

10. processing_statistics

Processing Workflow

1. Pre-Processing Phase

2. Document Creation/Update

3. Processing Phase

4. Post-Processing

Key SQL Queries

Check if Document Needs Processing

Get Document with Current Version

Find Documents by Ontology

Contract Expiration Report

Benefits of This Design

Usage in Application

Best Practices

1. `documents` (Core Table)

2. `file_hashes`

3. `document_versions`

4. `contract_ontology`

5. `document_ontology_mapping`

6. `companies`

7. `document_companies`

8. `document_relationships`

9. `processing_logs`

10. `processing_statistics`