Fix/anomaly scores normalization consistency by michele-milesi · Pull Request #156 · orobix/quadra

michele-milesi · 2026-02-20T17:27:33Z

Summary

Describe the purpose of the pull request, including:

Updated

Anomalib-orobix to v0.7.0.dev151 in order to make optimal threshold selection more robust with respect to floating point operations.

Fixed

normalize_anomaly_score now accepts an optional eval_threshold (EvalThreshold) parameter. When provided, consistency enforcement uses the actual evaluation boundary instead of always using the training threshold at 100.0, preventing misclassification of samples whose raw score falls close to the evaluation thresholds.
Consistency enforcement in anomaly score normalization now uses np.nextafter/torch.nextafter (dtype-aware) instead of hardcoded epsilon values, eliminating ULP-gap misclassifications especially at low-precision (fp16) boundaries.
AnomalibEvaluation now builds an EvalThreshold from the optimal evaluation threshold and passes it to normalize_anomaly_score, ensuring consistent predictions between raw and normalized anomaly scores and anomaly maps.

Type of Change

Please select the one relevant option below:

Bug fix (non-breaking change that solves an issue)

Checklist

Please confirm that the following tasks have been completed:

I have tested my changes locally and they work as expected. (Please describe the tests you performed.)
I have added unit tests for my changes, or updated existing tests if necessary.
I have updated the documentation, if applicable.
I have installed pre-commit and run locally for my code changes.

… and normalized scores

…nd 1e-3 for avoiding inconsistencies after rounding

Copilot

Pull request overview

This PR improves anomaly score normalization so that classification decisions remain consistent between raw scores and normalized scores/maps, especially around floating-point boundary conditions and when using a distinct evaluation threshold.

Changes:

Added EvalThreshold plus ensure_scores_consistency, and extended normalize_anomaly_score to optionally enforce consistency against the evaluation boundary.
Updated anomaly evaluation/report generation to pass an EvalThreshold derived from the optimal evaluation threshold.
Expanded unit tests to cover fp16/fp32 and numpy/torch consistency edge cases; bumped anomalib-orobix to 0.7.0.dev151 and released 2.8.1.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tests/utilities/test_anomaly_utils.py	Adds extensive tests for consistency enforcement and eval-threshold-aware normalization across numpy/torch and fp16/fp32.
quadra/utils/anomaly.py	Introduces `EvalThreshold`, `ensure_scores_consistency`, and updates `normalize_anomaly_score` to enforce prediction consistency using dtype-aware nextafter logic.
quadra/tasks/anomaly.py	Builds and passes `EvalThreshold` during report generation so normalized outputs align with the evaluation decision boundary.
quadra/init.py	Bumps package version to `2.8.1`.
pyproject.toml	Bumps package version to `2.8.1` and updates anomalib-orobix dependency to `0.7.0.dev151`.
poetry.lock	Updates lockfile for anomalib-orobix `0.7.0.dev151` and content hash.
CHANGELOG.md	Adds `2.8.1` release notes describing the fixes and dependency update.

Comments suppressed due to low confidence (2)

quadra/utils/anomaly.py:88

The docstring says no hard-coded epsilon is required, but the implementation still uses a fixed epsilon = 1e-3 and boundary - epsilon to define below_boundary. Either remove the fixed epsilon and rely purely on nextafter for a dtype-aware strict-inequality bound, or document why the extra 1e-3 margin is required (e.g., if downstream logic rounds to 3 decimals) and make the rounding/precision assumption explicit/parameterized.

    epsilon = 1e-3
    if isinstance(normalized_score, torch.Tensor):
        device = normalized_score.device
        # Work in scores dtype, cast boundaries to the same dype to ensure that casts take effect
        _inf = torch.tensor(float("inf"), dtype=normalized_score.dtype, device=device)
        boundary_tensor = torch.tensor(boundary, dtype=normalized_score.dtype, device=device)
        anomaly_boundary = boundary_tensor.clone()
        # If dtype cast causes anomaly_boundary to be smaller than normalized boundary (float),
        # increase it up to the next representable value
        if float(anomaly_boundary) < boundary:
            anomaly_boundary = torch.nextafter(anomaly_boundary, _inf)
        # Ensure consistency after rouding to 3 decimal places
        below_boundary = torch.min(torch.nextafter(boundary_tensor, -_inf), boundary_tensor - epsilon)

quadra/utils/anomaly.py:110

In the NumPy/scalar branch, dtype is taken from normalized_score.dtype only when it's an np.ndarray; NumPy scalar types (e.g. np.float16) will fall back to np.float64, which defeats the stated dtype-aware behavior and can change the output type. Consider deriving dtype via np.asarray(normalized_score).dtype (and similarly for raw_score) so scalar NumPy inputs stay dtype-consistent.

    elif isinstance(normalized_score, np.ndarray) or np.isscalar(normalized_score):
        # Work in scores dtype, cast boundaries to the same dype to ensure that casts take effect
        dtype = normalized_score.dtype if isinstance(normalized_score, np.ndarray) else np.float64
        _inf = np.array(np.inf, dtype=dtype)
        boundary_array = np.array(boundary, dtype=dtype)
        anomaly_boundary = boundary_array.copy()
        # If dtype cast causes anomaly_boundary to be smaller than normalized boundary (float),
        # increase it up to the next representable value
        if float(anomaly_boundary) < boundary:
            anomaly_boundary = np.nextafter(anomaly_boundary, _inf)
        # Ensure consistency after rouding to 3 decimal places
        below_boundary = np.minimum(np.nextafter(boundary_array, -_inf), boundary_array - epsilon)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-23T08:45:28Z

quadra/utils/anomaly.py

    score = raw_score
    if isinstance(score, torch.Tensor):
        score = score.cpu().numpy()
-    # Anomalib classify as anomaly if anomaly_score gte threshold
-    is_anomaly_mask = score >= threshold
+
+    boundary = eval_threshold.normalized
+    is_anomaly_mask = score >= eval_threshold.raw
    is_not_anomaly_mask = np.bitwise_not(is_anomaly_mask)


ensure_scores_consistency always converts raw_score to a NumPy array and builds is_anomaly_mask/is_not_anomaly_mask as NumPy booleans. In the torch.Tensor branch those NumPy masks are then used to index a Torch tensor, which will break for CUDA tensors (and can also be problematic on CPU). Build the masks as Torch boolean tensors on the same device when normalized_score is a Torch tensor (e.g., compare raw_score in torch or convert the mask to torch.bool on normalized_score.device).

lorenzomammana

Crazy

michele-milesi added 4 commits February 19, 2026 12:17

fix: anomaly scores normalization by ensuring classification with raw…

904a121

… and normalized scores

fix: device in ensuring predictions after normalization

6bd600e

fix: upgrade anomalib

fcdbb44

fix: ensure_scores_consistency now select minimum between nextafter a…

58b166f

…nd 1e-3 for avoiding inconsistencies after rounding

michele-milesi marked this pull request as ready for review February 23, 2026 08:42

michele-milesi requested a review from Copilot February 23, 2026 08:42

Copilot started reviewing on behalf of michele-milesi February 23, 2026 08:42 View session

michele-milesi requested a review from lorenzomammana February 23, 2026 08:45

Copilot AI reviewed Feb 23, 2026

View reviewed changes

lorenzomammana approved these changes Feb 23, 2026

View reviewed changes

michele-milesi merged commit 411dc4d into main Feb 23, 2026
5 checks passed

lorenzomammana deleted the fix/anomaly-scores-normalization-consistency branch February 23, 2026 10:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/anomaly scores normalization consistency#156

Fix/anomaly scores normalization consistency#156
michele-milesi merged 4 commits intomainfrom
fix/anomaly-scores-normalization-consistency

michele-milesi commented Feb 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 23, 2026

Uh oh!

lorenzomammana left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

michele-milesi commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Type of Change

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

lorenzomammana left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

michele-milesi commented Feb 20, 2026 •

edited

Loading