DataQuarantine is a production-grade Data Quality Firewall. It intercepts streaming data, enforces strict schema contracts in real-time, and isolates invalid records into a Dead Letter Queue (DLQ) without stopping the pipeline.
Get the entire stack (Kafka + Validator + UI) running in 2 steps:
# 1. Automated Setup Script (PowerShell)
.\start.ps1What this does: Starts 8 Docker containers, creates Kafka topics, initializes Postgres/MinIO, and launches the dashboard.
Access Points:
- Dashboard:
http://localhost:3000 - Kafka UI:
http://localhost:8090 - Grafana:
http://localhost:3001
Detailed Setup: See GETTING_STARTED.md.
Real-time monitoring of validation rates and schema health
Valid data → Clean Topic | Invalid data → DLQ & Review
Deep Dive: See ARCHITECTURE.md for the "Claim Check" and "DLQ" patterns.
- 🛡️ Real-time Enforcement: Validates JSON/Avro streams against Pydantic models.
- ☣️ Dead Letter Queue: Automatically isolates bad data. "Poison Pills" never crash compliance.
- 📦 Hybrid Storage: Uses Claim Check Pattern (Postgres for Metadata, MinIO for Payloads).
- 📊 Review & Replay: UI tools to fix quarantined data and re-inject it into the stream.
| Document | Description |
|---|---|
| System Architecture | DLQ Pattern, Claim Check, and HLD. |
| Getting Started | 3-Command setup and Simulation guide. |
| Failure Scenarios | Zero Data Loss strategies. |
| Interview Q&A | "Why Kafka?" and "Schema Evolution". |
| Component | Technology | Role |
|---|---|---|
| Stream | Apache Kafka | Durable Event Log. |
| Validator | Python (Pydantic) | Type-safe schema validation. |
| Storage | PostgreSQL + MinIO | Hybrid Metadata/Object store. |
| Frontend | Next.js 14 | Operations Dashboard. |
Harshan Aiyappa
Senior Full-Stack Hybrid Engineer
GitHub Profile
This project is licensed under the MIT License - see the LICENSE file for details.