BLACKICE ❄️

A production-grade streaming pipeline that detects persistent regime shifts in infrastructure metrics using statistical validation instead of opaque ML models.

Note: This project is an independent signals engineering library and is not associated with BlackIce Forensics, Databricks, or other security vendors.

1. Quick Start

Installation

Requires Python 3.10+.

# Install via pip
pip install blackice

Usage (Library)

Use RegimeDetector to embed stability logic directly into your Python services.

from blackice import RegimeDetector, RegimeState

# 1. Initialize detector (defaults: window=60, sigma=3.0)
detector = RegimeDetector()

# 2. Feed data points (e.g., from Kafka/Prometheus)
# Returns a DetectionEvent object
event = detector.update(45.5)

# 3. Act on findings
if event.is_anomaly:
    print(f"Instability detected: {event.reason}")
    print(f"Severity: {event.zscore:.2f}σ")
    
if event.state == RegimeState.SHIFTED:
    # Use explicit Enums for type-safety
    trigger_pager(f"Regime shift confirmled: {event.duration}s")

Usage (CLI)

Run the full analysis pipeline on your own data.

blackice --data <logs.csv> --machine <server_id> --report

Universal Input Requirements:

Input: A CSV file with columns ['machine_id', 'timestamp', 'cpu_util', 'mem_util'].
Output: Generates an incident report at reports/analysis_<server_id>.md.
Logic: Filters noise using the persistence logic defined in configs/default.yaml.

2. Hybrid ML (Offline Training)

Optimized Parameters, Deterministic Runtime.

BLACKICE includes an Offline Learning module that optimizes configuration parameters (Decision Boundaries) using historical data. This allows the system to adapt to specific infrastructure characteristics without running opaque models in production.

Training Usage

Use the training script to find the optimal z_threshold and window_size for your data:

# Finds best parameters using Grid Search over 48 combinations
python train_model.py data/machine_usage.csv --output configs/learned.yaml

What happens:

Objective Function: Minimizes a loss function where False_Alerts are penalized 5x more than Detection_Layency.
Optimizer: Runs the pipeline on historical data with varying parameters (Grid Search).
Output: Generates a production-ready learned.yaml config file.

Then run the detector with the learned config:

blackice --config configs/learned.yaml ...

2. Motivation

Infrastructure monitoring is plagued by alert fatigue. Traditional threshold-based alerting generates noise from transient spikes, while complex ML models introduce opacity and drift.

BLACKICE addresses this by shifting focus from point anomalies to persistent regime shifts. It acknowledges that infrastructure data is inherently noisy and bursty. Instead of training complex models to predict every spike, BLACKICE uses rigorous statistical persistence validation to distinguish between:

Transient Instability: Burstiness that returns to baseline (filtered out).
Structural Deviation: Shifts that persist beyond a confidence interval (reported).

The system avoids "AI magic" in favor of deterministic, explainable signal processing that can be debugged by an SRE at 3 AM.

3. Architecture Overview

BLACKICE is engineered as a streaming processing pipeline, not a batch analysis script. It operates in O(1) memory per metric tracker.

Components

Baseline Modeling: Welford's algorithm for numerically stable, streaming mean/variance without history retention.
Deviation Detection: Real-time z-score computation against rolling baselines.
Persistence Validation: Deterministic filter requiring deviations to sustain magnitude and duration thresholds to trigger state changes.
State Machine: Formal transition logic (NORMAL → UNSTABLE → SHIFTED) providing clean audit trails.
RegimeDetector: High-level facade that encapsulates the entire engine into a single, easy-to-use API.
Metrics/Reporting: Generates label-free stability metrics and automated incident analysis files.

4. Key Features

Streaming-First Design: Processes infinite streams chunk-by-chunk using minimal resources.
Constant Memory: O(window_size) memory complexity regardless of dataset size.
Label-Free Metrics: Quality metrics (detection latency, spike rejection) computed without ground truth labels.
Noise Rejection: Aggressive persistence layer filters 80-90% of transient noise typical in cloud workloads.
Automated Reporting: Instantly generates production-grade Markdown incident reports.
CLI-Driven: Unix-philosophy operational interface.

5. Project Structure

blackice/
├── configs/            # YAML configuration (Learned & Defaults)
├── notebooks/          # Analysis and visualization
├── reports/            # Generated incident reports
├── scripts/            # Testing & CI Verification
│   ├── test_blackice.py    # Integration suite
│   ├── test_cli_smoke.py   # Operational smoke tests
│   └── test_determinism.py # O(1) Invariant checks
├── src/
│   └── blackice/       # Core library
│       ├── learning/       # [NEW] Offline ML Module
│       │   ├── objective.py    # Loss Function (SRE-weighted)
│       │   └── optimizer.py    # Grid Search Trainer
│       ├── baseline.py     # Streaming statistics
│       ├── cli.py          # Production CLI entry point
│       ├── detector.py     # High-level RegimeDetector API
│       ├── deviation.py    # Signal detection
│       ├── metrics.py      # Stability metrics
│       ├── persistence.py  # Noise filtering logic
│       ├── pipeline.py     # Orchestration
│       └── state.py        # Regime state machine
├── train_model.py      # [NEW] ML Training Entrypoint
└── pyproject.toml      # Project Metadata & Dependencies

6. Example Output

BLACKICE generates structured incident reports designed for engineering transparency.

Report Sections:

Executive Summary: Immediate text verdict (HEALTHY/UNHEALTHY) based on confirmed shifts.
Signal Summary: Detailed breakdown of CPU/Memory behavior patterns.
Detection Statistics: Tables showing total events vs. confirmed shifts (often 100% rejection rate for stable but bursty machines).
Infra Interpretation: Automated diagnosis of workload characteristics (e.g., "High variance but stable").

A "Health" verdict often accompanies high instability counts. This is correct behavior: it proves the system successfully identified thousands of spikes as non-critical noise, preventing thousands of false pages.

7. Design Philosophy

Conservative by Design: We prefer missing a subtle shift over waking an engineer for a false positive.
Deterministic > Probabilistic: Given the same input, the system must produce the exact same state transitions.
Explainability > Complexity: Every regime shift has a clear reason (e.g., "Deviation persisted for >10 points"), traceable back to specific timestamps.

8. System Properties

Guarantees

Deterministic output for identical input streams
O(1) update time per data point
Bounded memory usage (O(window_size))
No Online Training (Learning is decoupled & offline)
Learned Configs, Deterministic Execution

Non-Goals

Not a forecasting system
Not a root-cause analysis engine
Not a replacement for TSDBs or Prometheus

9. License

MIT License.

For Contributors: To setup a development environment and run tests, please see DEVELOPMENT.md.

10. Author

Created by Mihir Maru.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github/workflows		.github/workflows
assets		assets
configs		configs
data		data
notebooks		notebooks
reports		reports
scripts		scripts
src/blackice		src/blackice
.gitattributes		.gitattributes
.gitignore		.gitignore
DEVELOPMENT.md		DEVELOPMENT.md
LICENSE		LICENSE
README.md		README.md
blackice_source_dump.txt		blackice_source_dump.txt
demo.py		demo.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
train_model.py		train_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BLACKICE ❄️

1. Quick Start

Installation

Usage (Library)

Usage (CLI)

2. Hybrid ML (Offline Training)

Training Usage

2. Motivation

3. Architecture Overview

Components

4. Key Features

5. Project Structure

6. Example Output

Report Sections:

7. Design Philosophy

8. System Properties

Guarantees

Non-Goals

9. License

10. Author

About

Uh oh!

Releases

Packages

Languages

License

Mihirmaru22/BLACKICE

Folders and files

Latest commit

History

Repository files navigation

BLACKICE ❄️

1. Quick Start

Installation

Usage (Library)

Usage (CLI)

2. Hybrid ML (Offline Training)

Training Usage

2. Motivation

3. Architecture Overview

Components

4. Key Features

5. Project Structure

6. Example Output

Report Sections:

7. Design Philosophy

8. System Properties

Guarantees

Non-Goals

9. License

10. Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages