Get from zero to your first detected anomaly in five commands.
- Python 3.11+
- A dbt project with
run_results.jsonor a JSONL file of pipeline runs
pip install pipeline-anomaly-detectorFor development (tests, notebook):
pip install "pipeline-anomaly-detector[dev]"From a dbt project:
pad collect \
--source dbt \
--dbt-dir "./target/run_results.json" \
--since 2024-01-01 \
--output runs.jsonlFrom a generic JSONL file:
pad collect \
--source generic \
--input my_pipeline_runs.json \
--output runs.jsonlFrom Airflow:
pad collect \
--source airflow \
--airflow-db "sqlite:///~/airflow/airflow.db" \
--since 2024-01-01 \
--output runs.jsonlpad train \
--input runs.jsonl \
--detector ensemble \
--output ./modelsThis trains an EnsembleDetector (ZScore + IsolationForest) on your collected
runs and saves the model to ./models/.
Available detector types:
| Value | Description |
|---|---|
ensemble |
Weighted combination of ZScore + IsolationForest (recommended) |
zscore |
Fast, interpretable z-score baseline |
isolation_forest |
sklearn IsolationForest, handles non-linear anomalies |
pad score-batch \
--input new_runs.jsonl \
--model ./models/global_ensemble_20240115T120000Z.joblib \
--db scores.dbThe results are printed to the terminal and persisted to scores.db.
pad explain \
--run-id anomaly_duration_000 \
--model ./models/global_ensemble_20240115T120000Z.joblib \
--db scores.dbThis prints a Rich panel with the anomaly score bar, contributing features, and is_anomaly status.
pad models list --store-dir ./models- See Feature Reference for the full feature list.
- See Detector Comparison to pick the right detector.
- Configure Slack alerts by setting
SLACK_WEBHOOK_URLin your environment.