Quickstart

Get from zero to your first detected anomaly in five commands.

Prerequisites

pip install pipeline-anomaly-detector

For development (tests, notebook):

pip install "pipeline-anomaly-detector[dev]"

From a dbt project:

pad collect \
  --source dbt \
  --dbt-dir "./target/run_results.json" \
  --since 2024-01-01 \
  --output runs.jsonl

From a generic JSONL file:

pad collect \
  --source generic \
  --input my_pipeline_runs.json \
  --output runs.jsonl

From Airflow:

pad collect \
  --source airflow \
  --airflow-db "sqlite:///~/airflow/airflow.db" \
  --since 2024-01-01 \
  --output runs.jsonl

pad train \
  --input runs.jsonl \
  --detector ensemble \
  --output ./models

This trains an EnsembleDetector (ZScore + IsolationForest) on your collected runs and saves the model to ./models/.

Available detector types:

Value	Description
`ensemble`	Weighted combination of ZScore + IsolationForest (recommended)
`zscore`	Fast, interpretable z-score baseline
`isolation_forest`	sklearn IsolationForest, handles non-linear anomalies

pad score-batch \
  --input new_runs.jsonl \
  --model ./models/global_ensemble_20240115T120000Z.joblib \
  --db scores.db

The results are printed to the terminal and persisted to scores.db.

pad explain \
  --run-id anomaly_duration_000 \
  --model ./models/global_ensemble_20240115T120000Z.joblib \
  --db scores.db

This prints a Rich panel with the anomaly score bar, contributing features, and is_anomaly status.

pad models list --store-dir ./models