Local Data Platform Demo (Landing → Raw → Cleaned → Serving)

This repo demonstrates data platform behaviors locally, with a layer naming scheme closer to many real systems:

Landing → ingest as-is (staging/temp storage, think external bucket)
Raw → copy/convert into lakehouse-format (Parquet) and keep history
Cleaned → transforms + idempotent incremental builds (stored in DuckDB)
Serving (Curated) → the only layer consumers should read (serving/exports/)

Run

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

./run_all.sh

Outputs

quality/reports/*.md and quality/quality_summary.csv
serving/exports/ (consumer boundary)
state/audit_log.jsonl

Governance + Tests

Ownership required: governance/ownership.yml
Classification required (including Serving): governance/data_classification.yml
Executable tests:

make test-governance

GDPR masking

Instead of deleting facts (which can break analytics), we mask PII in cleaned_users:

python governance/gdpr_mask_user.py --user-id 3
python pipelines/build_serving_daily_metrics.py

Additional Serving model (PII-free user-level)

serving/exports/user_daily_metrics.parquet
- keyed by user_key (stable anonymized key derived from user_id)
- allows user-level analytics without exposing PII

Targeted recompute after GDPR (v5)

Instead of rebuilding everything, this demo shows partition-based remediation after a GDPR request.

./run_gdpr_flow.sh 3

What it does:

records the request in gdpr_requests (DuckDB)
masks PII in cleaned_users
recomputes only the affected event_date partitions for that user's serving_user_daily_metrics

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local Data Platform Demo (Landing → Raw → Cleaned → Serving)

Run

Outputs

Governance + Tests

GDPR masking

Additional Serving model (PII-free user-level)

Targeted recompute after GDPR (v5)

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data/source		data/source
governance		governance
pipelines		pipelines
quality		quality
tests		tests
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
run_all.sh		run_all.sh
run_gdpr_flow.sh		run_gdpr_flow.sh

anhkhoadx/data-platform-local

Folders and files

Latest commit

History

Repository files navigation

Local Data Platform Demo (Landing → Raw → Cleaned → Serving)

Run

Outputs

Governance + Tests

GDPR masking

Additional Serving model (PII-free user-level)

Targeted recompute after GDPR (v5)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages