Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
b09c63a
Add mode switching and metadata handling to quick CLI
teehari79 Oct 5, 2025
9dc90f9
Merge pull request #1 from teehari79/codex/add-mode-flag-to-quick_fau…
teehari79 Oct 5, 2025
92c4b62
Add unsupervised behaviour model for sensor anomalies
teehari79 Oct 6, 2025
3753dea
Merge pull request #2 from teehari79/codex/create-anomaly-detection-f…
teehari79 Oct 6, 2025
1e4fb37
Add asset dataset splitting utility
teehari79 Oct 6, 2025
035bb96
Merge pull request #3 from teehari79/codex/create-asset-csv-files-fro…
teehari79 Oct 6, 2025
340e413
discover the delimiter of the csv file and then use that for reading it
hari-humanize Oct 6, 2025
a498e52
Add run and pre-process
hari-humanize Oct 6, 2025
3b7f9f2
Add model output customization options for quick fault training
teehari79 Oct 6, 2025
689cec8
Merge pull request #4 from teehari79/codex/run-quickfaultdetector.py-…
teehari79 Oct 6, 2025
3564bbc
Change config
hari-humanize Oct 7, 2025
b0ce9bc
Handle empty numeric columns during imputation
teehari79 Oct 7, 2025
2efd00e
Merge pull request #5 from teehari79/codex/fix-valueerror-in-data-imp…
teehari79 Oct 7, 2025
f59e533
Guard preprocessing when time column missing
teehari79 Oct 7, 2025
e8ffba7
Merge branch 'main' into codex/fix-valueerror-in-data-imputation-o1cbmt
teehari79 Oct 7, 2025
4d931e1
fixed issue with csv delimiter
hari-humanize Oct 7, 2025
a562747
Add predict mode runner for quick fault detector
teehari79 Oct 7, 2025
d97cdc7
Merge pull request #7 from teehari79/codex/create-python-executable-f…
teehari79 Oct 7, 2025
8b1056c
Merge pull request #6 from teehari79/codex/fix-valueerror-in-data-imp…
teehari79 Oct 7, 2025
8d55c45
Adjust prediction defaults for wind farm dataset
teehari79 Oct 7, 2025
a540254
Merge pull request #8 from teehari79/codex/update-prediction-code-for…
teehari79 Oct 7, 2025
a4061f8
Improve ColumnSelector missing column error message
teehari79 Oct 8, 2025
a76e22b
Merge pull request #9 from teehari79/codex/enhance-column-selector-er…
teehari79 Oct 8, 2025
f9a7aa2
Enhance anomaly prediction metadata
teehari79 Oct 8, 2025
afd58f5
Merge pull request #10 from teehari79/codex/add-anomaly-flagging-to-p…
teehari79 Oct 8, 2025
620ae61
Handle missing training data when plotting quick fault results
teehari79 Oct 8, 2025
259e5cc
Merge pull request #11 from teehari79/codex/fix-error-with-empty-inpu…
teehari79 Oct 8, 2025
540eddf
Add support for ignoring features in ARCANA
teehari79 Oct 8, 2025
f1ab657
Merge pull request #12 from teehari79/codex/add-option-to-ignore-fiel…
teehari79 Oct 8, 2025
48fa690
Save anomaly details during prediction runs
teehari79 Oct 8, 2025
a876d0c
Merge pull request #13 from teehari79/codex/save-anomaly-details-data…
teehari79 Oct 8, 2025
846742e
Clarify and extend ARCANA feature ignoring
teehari79 Oct 8, 2025
e2d1a4d
Merge branch 'main' into codex/add-option-to-ignore-fields-in-arcana-…
teehari79 Oct 8, 2025
a086029
Mask ignored features in anomaly scoring and RCA
teehari79 Oct 8, 2025
6850031
Merge pull request #14 from teehari79/codex/add-option-to-ignore-fiel…
teehari79 Oct 8, 2025
04383cd
Merge branch 'main' into codex/add-option-to-ignore-fields-in-arcana-…
teehari79 Oct 8, 2025
8b588d8
Merge pull request #15 from teehari79/codex/add-option-to-ignore-fiel…
teehari79 Oct 8, 2025
da0789c
Save prediction outputs by asset
teehari79 Oct 8, 2025
03cf9fb
Merge pull request #16 from teehari79/codex/save-prediction-output-as…
teehari79 Oct 8, 2025
c7d851a
Fix RCA ignore patterns and reset cumulative score
teehari79 Oct 9, 2025
104deab
Merge pull request #17 from teehari79/codex/fix-anomalous-field-detec…
teehari79 Oct 9, 2025
2dcdca4
changed config
hari-humanize Oct 9, 2025
8b93bd7
updated base_config.yaml with ignore features
hari-humanize Oct 9, 2025
8432fec
Make ignored feature matching case-insensitive
teehari79 Oct 9, 2025
9c5bc1b
Merge pull request #18 from teehari79/codex/fix-ignore_features-not-a…
teehari79 Oct 9, 2025
d291706
Ensure predict mode uses fallback config
teehari79 Oct 9, 2025
2a857b9
Merge pull request #19 from teehari79/codex/fix-ignored_features-not-…
teehari79 Oct 9, 2025
68d16cf
Ensure ARCANA respects ignore feature patterns
teehari79 Oct 10, 2025
4cc0181
Merge pull request #20 from teehari79/codex/fix-feature-masking-in-ar…
teehari79 Oct 10, 2025
3f6d7c7
Added print statements for debugging
hari-humanize Oct 10, 2025
40d84c9
Merge branch 'main' of https://github.com/teehari79/EnergyFaultDetector
hari-humanize Oct 10, 2025
c3cffda
Updated fault_detector
hari-humanize Oct 10, 2025
cde8346
print arcana args
hari-humanize Oct 10, 2025
832044d
Allow overriding ignored features during prediction
teehari79 Oct 10, 2025
dcf8425
Merge pull request #21 from teehari79/codex/fix-ignore_features-in-ru…
teehari79 Oct 10, 2025
233ee34
Ensure predict propagates ignore features
teehari79 Oct 10, 2025
503447f
Merge pull request #22 from teehari79/codex/implement-ignore_features…
teehari79 Oct 10, 2025
6aca675
Preserve ignore feature configuration when loading models
teehari79 Oct 10, 2025
0beda01
Merge pull request #23 from teehari79/codex/fix-model-loading-overwri…
teehari79 Oct 10, 2025
0f2ae75
Fix create_events boolean alignment
teehari79 Oct 10, 2025
31e6828
Merge pull request #24 from teehari79/codex/resolve-indexerror-in-run…
teehari79 Oct 10, 2025
bbd405a
Added print statements for troubleshooting... Moved data splitter und…
hari-humanize Oct 11, 2025
cfbd612
Optimize asset dataset splitter memory usage
teehari79 Oct 11, 2025
a0c47f0
Merge pull request #25 from teehari79/codex/optimize-asset_data_split…
teehari79 Oct 11, 2025
1d71b31
Add event identifiers to prediction outputs
teehari79 Oct 12, 2025
85d9dd8
Merge pull request #26 from teehari79/codex/add-event-id-and-critical…
teehari79 Oct 12, 2025
4f7fe27
Fix threshold fitting mask alignment
teehari79 Oct 12, 2025
d7ee96d
Merge pull request #27 from teehari79/codex/debug-training-error-with…
teehari79 Oct 12, 2025
ed6dd7d
Fix threshold fitting mask alignment
teehari79 Oct 13, 2025
5050277
Merge pull request #28 from teehari79/codex/fix-index-out-of-bounds-e…
teehari79 Oct 13, 2025
aff3051
Avoid large allocations in RMSE score fitting
teehari79 Oct 13, 2025
d2c81ef
Merge pull request #29 from teehari79/codex/fix-arraymemoryerror-duri…
teehari79 Oct 13, 2025
549f019
Fix quantile threshold selector alignment and tests
teehari79 Oct 13, 2025
dddc0e8
Merge pull request #30 from teehari79/codex/fix-indexerror-in-thresho…
teehari79 Oct 13, 2025
a3e9e6e
Add REST API for prediction and configure model registry
teehari79 Oct 13, 2025
bd5e245
Merge pull request #32 from teehari79/codex/create-rest-api-for-model…
teehari79 Oct 13, 2025
62791f5
Add FastAPI prediction endpoint with enhanced validation
teehari79 Oct 13, 2025
869709f
Merge branch 'main' into codex/update-prediction-api-error-handling-a…
teehari79 Oct 13, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 41 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,18 +36,54 @@ To install the `energy-fault-detector` package, run: `pip install energy-fault-d


## Quick fault detection
For a quick demo on a specific dataset, run:
The `quick_fault_detector` CLI now supports dedicated training and prediction workflows:

```quick_fault_detector <path_to_your_dataset.csv>```
- **Train and evaluate a model** (default mode). This trains a new autoencoder, evaluates it on the provided
test slice, and reports where the model artefacts were stored.

For more options, run ```quick_fault_detector -h```.
```bash
quick_fault_detector <path_to_training_data.csv> --mode train [--options options.yaml]
```

For an example using one of the CARE2Compare datasets, run:
```quick_fault_detector <path_to_c2c_dataset.csv> --c2c_example```
- **Run predictions with an existing model**. Supply the dataset to score alongside the directory that contains the
saved model files returned from a previous training run.

```bash
quick_fault_detector <path_to_evaluation_data.csv> --mode predict --model_path <path_to_saved_model> [--options options.yaml]
```

The `--model_path` argument is mandatory in predict mode.

Prediction artefacts (anomaly scores, reconstructions, and detected events) are written to the directory specified by
`--results_dir` (defaults to `./results`). For an example using one of the CARE2Compare datasets, run:

```bash
quick_fault_detector <path_to_c2c_dataset.csv> --c2c_example
```

For more information, have a look at the notebook [Quick Failure Detection](./notebooks/Example%20-%20Quick%20Failure%20Detection.ipynb)


## REST prediction API

The project ships with a lightweight FastAPI application that exposes the prediction workflow via HTTP. The service
resolves models by name and version using the directory structure described in
[`energy_fault_detector/api/service_config.yaml`](energy_fault_detector/api/service_config.yaml).

Start the API with:

```bash
uvicorn energy_fault_detector.api.app:app --reload
```

By default the service reads its configuration from the bundled `service_config.yaml`. Provide the
`EFD_SERVICE_CONFIG` environment variable to point to a custom YAML file when you want to adapt the model root
directory, override default ignore patterns, or tweak other runtime parameters. Predictions are triggered with a `POST`
request to `/predict` and expect a JSON payload containing at least the `model_name` and `data_path` fields. Optional
fields such as `model_version`, `ignore_features`, and `asset_name` refine which artefacts are used and how the results
are stored.


## Fault detection in 5 lines of code

```python
Expand Down
224 changes: 224 additions & 0 deletions asset_dataset_splitter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,224 @@
"""Utility for splitting asset datasets into train and prediction CSV files.

The module provides a command line interface that accepts the path to a
directory containing ``.csv`` files separated by semicolons. Each file is
expected to contain records for one or more assets identified by the
``asset_id`` column.

The script creates two output files per asset:

``train_<asset>.csv``
Contains all records marked as ``train`` in the ``train_test`` column and
all records whose ``status_type_id`` is either 0 or 2.

``predict_<asset>.csv``
Contains all records whose ``status_type_id`` is 1, 3, 4 or 5 and all
records whose ``train_test`` value is ``prediction`` regardless of the
``status_type_id`` value.

Before saving, the script removes helper columns as well as columns that
contain ``_max``, ``_min`` or ``_std`` from the resulting files.
"""

from __future__ import annotations

import argparse
from pathlib import Path
from typing import Dict, Iterable, MutableMapping

import pandas as pd

import csv


STATUS_TYPE_NORMAL = {"0", "2"}
STATUS_TYPE_ANOMALY = {"1", "3", "4", "5"}
DROP_COLUMNS = {
"asset_id",
"train_test",
"train_test_bool",
"status_type_id",
"status_type_bool",
}
DROP_COLUMN_SUBSTRINGS = ("_max", "_min", "_std")


def _detect_delimiter(csv_file: Path) -> str:
"""Return the detected delimiter for ``csv_file`` with ``;`` as fallback."""

with open(csv_file, "r", encoding="utf-8", errors="ignore") as handle:
sample = handle.read(2048)
sniffer = csv.Sniffer()
try:
dialect = sniffer.sniff(sample)
return dialect.delimiter
except csv.Error:
return ";"


def _iter_asset_frames(
csv_file: Path, *, chunksize: int | None = 100_000
) -> Iterable[tuple[str, pd.DataFrame]]:
"""Yield (asset_id, dataframe) pairs from ``csv_file`` without loading everything into memory."""

delimiter = _detect_delimiter(csv_file)
reader = pd.read_csv(csv_file, sep=delimiter, dtype=str, chunksize=chunksize)

# ``pd.read_csv`` returns ``DataFrame`` when ``chunksize`` is ``None``.
if isinstance(reader, pd.DataFrame):
reader = [reader]

for chunk in reader:
if chunk.empty:
continue

if "asset_id" not in chunk.columns:
raise ValueError(
f"File '{csv_file}' does not contain required column 'asset_id'."
)

for asset_id, group in chunk.groupby("asset_id", sort=False):
yield str(asset_id), group.reset_index(drop=True)


def _clean_columns(df: pd.DataFrame) -> pd.DataFrame:
"""Remove helper columns and columns containing ``_max``/``_min``/``_std``."""

to_drop = [col for col in df.columns if col in DROP_COLUMNS]
to_drop.extend(
col for col in df.columns if any(substr in col for substr in DROP_COLUMN_SUBSTRINGS)
)
return df.drop(columns=to_drop, errors="ignore")


def _get_lowercase_series(df: pd.DataFrame, column: str) -> pd.Series:
series = df.get(column)
if series is None:
return pd.Series("", index=df.index, dtype=object)
return series.fillna("").astype(str).str.lower()


def _get_status_series(df: pd.DataFrame) -> pd.Series:
series = df.get("status_type_id")
if series is None:
return pd.Series("", index=df.index, dtype=object)
return series.fillna("").astype(str)


def _build_train_frame(df: pd.DataFrame) -> pd.DataFrame:
"""Filter rows belonging to the training split."""

train_series = _get_lowercase_series(df, "train_test")
status_series = _get_status_series(df)

is_train = train_series == "train"
is_normal_status = status_series.isin(STATUS_TYPE_NORMAL)
return df[is_train | is_normal_status].copy()


def _build_predict_frame(df: pd.DataFrame) -> pd.DataFrame:
"""Filter rows belonging to the prediction split."""

train_series = _get_lowercase_series(df, "train_test")
status_series = _get_status_series(df)

is_prediction = train_series == "prediction"
is_anomaly_status = status_series.isin(STATUS_TYPE_ANOMALY)
return df[is_prediction | is_anomaly_status].copy()


def _append_asset_frames(
asset_id: str,
df: pd.DataFrame,
output_dir: Path,
header_written: MutableMapping[Path, bool],
) -> None:
"""Append data for ``asset_id`` to the corresponding train/predict CSV files."""

if df.empty:
return

train_df = _clean_columns(_build_train_frame(df))
predict_df = _clean_columns(_build_predict_frame(df))

if not train_df.empty:
train_path = output_dir / f"train_{asset_id}.csv"
train_df.to_csv(
train_path,
sep=";",
index=False,
mode="a",
header=not header_written.get(train_path, False),
)
header_written[train_path] = True

if not predict_df.empty:
predict_path = output_dir / f"predict_{asset_id}.csv"
predict_df.to_csv(
predict_path,
sep=";",
index=False,
mode="a",
header=not header_written.get(predict_path, False),
)
header_written[predict_path] = True


def split_asset_datasets(
input_dir: Path,
output_dir: Path | None = None,
*,
chunksize: int | None = 100_000,
) -> None:
"""Split datasets per asset into train and prediction CSV files.

Args:
input_dir: Directory containing the source ``.csv`` files.
output_dir: Optional directory to store the results. When ``None`` the
input directory is used.
chunksize: Maximum number of rows per chunk read from the source files.
``None`` loads the entire file at once.
"""

if not input_dir.is_dir():
raise NotADirectoryError(f"Input directory '{input_dir}' does not exist or is not a directory")

output_dir = output_dir or input_dir
output_dir.mkdir(parents=True, exist_ok=True)

# Remove previously generated files to avoid appending to stale data.
for existing_file in output_dir.glob("train_*.csv"):
existing_file.unlink()
for existing_file in output_dir.glob("predict_*.csv"):
existing_file.unlink()

header_written: Dict[Path, bool] = {}

for csv_file in sorted(input_dir.glob("*.csv")):
for asset_id, asset_df in _iter_asset_frames(csv_file, chunksize=chunksize):
_append_asset_frames(asset_id, asset_df, output_dir, header_written)


def _parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(description="Split asset datasets into train and prediction files")
parser.add_argument(
"input_dir",
type=Path,
help="Path to the directory containing the source CSV files",
)
parser.add_argument(
"--output-dir",
type=Path,
default=None,
help="Optional output directory. Defaults to the input directory.",
)
return parser.parse_args()


def main() -> None:
args = _parse_args()
split_asset_datasets(args.input_dir, args.output_dir)


if __name__ == "__main__":
main()
9 changes: 8 additions & 1 deletion docs/config_example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -63,4 +63,11 @@ root_cause_analysis: # (optional) if not specified, no root_cause_analysis (ARC
alpha: 0.8
init_x_bias: recon
num_iter: 200

ignore_features:
# Patterns apply to anomaly scoring and ARCANA. Wildcards such as "windspeed*" are supported.
# list exact column names or use wildcards such as "windspeed*"
- wind_speed_59_avg
- power_58_avg
- wind_speed_60_avg
- wind_speed_61_avg
- power_62_avg
36 changes: 36 additions & 0 deletions docs/usage_examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,42 @@ algorithm. An example:
.. include:: config_example.yaml
:literal:

To keep specific sensors out of both anomaly scoring *and* ARCANA you can edit the configuration YAML that you pass to
:py:obj:`Config <energy_fault_detector.config.Config>` (for example ``configs/base_config.yaml`` or a copy of it). Inside
the ``root_cause_analysis`` section add or update the optional ``ignore_features`` list. Any column names listed here are
zeroed before the anomaly score is calculated and remain fixed during ARCANA optimisation, so they never trigger
anomalies and are never suggested as a root cause.

Column names must match the data frame headers used during training or prediction. You can either provide the exact
column name or use Unix shell-style wildcards to target multiple columns at once (for example ``windspeed*`` matches
``windspeed_avg`` and ``windspeed_peak``). If a configured pattern does not match any columns a warning is logged so you
can correct the entry.

.. code-block:: yaml

root_cause_analysis:
ignore_features:
- windspeed*
- output_power

When the configuration above is used, wind speed signals such as ``windspeed_avg`` are kept for model training but their
reconstruction error is ignored for anomaly scores and in ARCANA runs. The same YAML file can be supplied when calling
:py:meth:`FaultDetector.fit <energy_fault_detector.fault_detector.FaultDetector.fit>` for training or
:py:meth:`FaultDetector.predict <energy_fault_detector.fault_detector.FaultDetector.predict>` for inference so the same
set of exclusions is applied in both stages.
To prevent ARCANA from adjusting certain sensors you can edit the configuration YAML that you pass to
:py:obj:`Config <energy_fault_detector.config.Config>` (for example ``configs/base_config.yaml`` or a copy of it). Inside
the ``root_cause_analysis`` section add or update the optional ``ignore_features`` list. Any column names listed here
remain fixed during optimisation and are therefore never suggested as a root cause.

Column names must match the data frame headers used at prediction time. You can either provide the exact column name or
use Unix shell-style wildcards to target multiple columns at once (for example ``windspeed*`` matches
``windspeed_avg`` and ``windspeed_peak``). If a configured pattern does not match any columns a warning is logged so you
can correct the entry.
The optional ``ignore_features`` list inside ``root_cause_analysis`` can be used to prevent ARCANA from adjusting
specific sensors (for example ``windspeed`` or ``output_power``). These features will be kept fixed during the
optimisation and therefore won't be reported as a root cause.

To update the configuration 'on the fly' (for example for hyperparameter optimization), you provide a new
configuration dictionary via the ``update_config`` method:

Expand Down
Loading