Toolkit Data Contracts and Drift Detection

Enterprise-grade data contract management and drift detection for ML/LLM pipelines to prevent silent schema changes and data quality regressions.

Overview

The Toolkit Data Contracts and Drift Detection tool provides a lightweight, dependency-free solution for maintaining data quality and consistency in machine learning pipelines. It automatically infers data contracts from samples, validates new data against established contracts, and detects drift before it impacts model performance.

Key Features

Contract Management

Automatic Contract Inference: Generate contracts from JSONL samples
Schema Validation: Enforce data structure and type constraints
Version Control: Track contract evolution over time
Flexible Configuration: Allow extra fields, required fields, custom types

Drift Detection

Statistical Profiling: Build comprehensive baseline profiles
Distribution Analysis: Track changes in data distributions
Quality Gates: Automated validation with configurable thresholds
CI/CD Integration: Exit codes for pipeline integration

Enterprise Features

Zero Dependencies: Lightweight, easy to deploy
CLI Interface: Simple command-line tools
JSON Format: Human-readable contracts and profiles
Batch Processing: Handle large datasets efficiently

Quick Start

Installation

# Install from source
git clone https://github.com/AKIVA-AI/toolkit-data-contracts.git
cd toolkit-data-contracts
pip install -e ".[dev]"

# Install in production
pip install toolkit-data-contracts-drift

Basic Usage

# 1. Infer contract from sample data
toolkit-contracts infer --input samples.jsonl --out contract.json

# 2. Create baseline profile
toolkit-contracts profile --input baseline.jsonl --contract contract.json --out baseline.profile.json

# 3. Validate new data and check for drift
toolkit-contracts check --input new_batch.jsonl --contract contract.json --baseline baseline.profile.json

CLI Commands

infer - Generate contract from JSONL samples
profile - Create baseline profile from data
check - Validate data and detect drift

Exit Codes

0 - Validation passed
4 - Validation failed or drift detected

License

MIT License - see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github		.github
examples		examples
src/toolkit_data_contracts_drift		src/toolkit_data_contracts_drift
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
DEPLOYMENT.md		DEPLOYMENT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
QUICKSTART.md		QUICKSTART.md
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
requirements-dev.txt		requirements-dev.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Toolkit Data Contracts and Drift Detection

Overview

Key Features

Contract Management

Drift Detection

Enterprise Features

Quick Start

Installation

Basic Usage

CLI Commands

Exit Codes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Toolkit Data Contracts and Drift Detection

Overview

Key Features

Contract Management

Drift Detection

Enterprise Features

Quick Start

Installation

Basic Usage

CLI Commands

Exit Codes

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages