Skip to content

Fix/anomaly scores normalization consistency#156

Merged
michele-milesi merged 4 commits intomainfrom
fix/anomaly-scores-normalization-consistency
Feb 23, 2026
Merged

Fix/anomaly scores normalization consistency#156
michele-milesi merged 4 commits intomainfrom
fix/anomaly-scores-normalization-consistency

Conversation

@michele-milesi
Copy link
Member

@michele-milesi michele-milesi commented Feb 20, 2026

Summary

Describe the purpose of the pull request, including:

Updated

  • Anomalib-orobix to v0.7.0.dev151 in order to make optimal threshold selection more robust with respect to floating point operations.

Fixed

  • normalize_anomaly_score now accepts an optional eval_threshold (EvalThreshold) parameter. When provided, consistency enforcement uses the actual evaluation boundary instead of always using the training threshold at 100.0, preventing misclassification of samples whose raw score falls close to the evaluation thresholds.
  • Consistency enforcement in anomaly score normalization now uses np.nextafter/torch.nextafter (dtype-aware) instead of hardcoded epsilon values, eliminating ULP-gap misclassifications especially at low-precision (fp16) boundaries.
  • AnomalibEvaluation now builds an EvalThreshold from the optimal evaluation threshold and passes it to normalize_anomaly_score, ensuring consistent predictions between raw and normalized anomaly scores and anomaly maps.

Type of Change

Please select the one relevant option below:

  • Bug fix (non-breaking change that solves an issue)

Checklist

Please confirm that the following tasks have been completed:

  • I have tested my changes locally and they work as expected. (Please describe the tests you performed.)
  • I have added unit tests for my changes, or updated existing tests if necessary.
  • I have updated the documentation, if applicable.
  • I have installed pre-commit and run locally for my code changes.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves anomaly score normalization so that classification decisions remain consistent between raw scores and normalized scores/maps, especially around floating-point boundary conditions and when using a distinct evaluation threshold.

Changes:

  • Added EvalThreshold plus ensure_scores_consistency, and extended normalize_anomaly_score to optionally enforce consistency against the evaluation boundary.
  • Updated anomaly evaluation/report generation to pass an EvalThreshold derived from the optimal evaluation threshold.
  • Expanded unit tests to cover fp16/fp32 and numpy/torch consistency edge cases; bumped anomalib-orobix to 0.7.0.dev151 and released 2.8.1.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/utilities/test_anomaly_utils.py Adds extensive tests for consistency enforcement and eval-threshold-aware normalization across numpy/torch and fp16/fp32.
quadra/utils/anomaly.py Introduces EvalThreshold, ensure_scores_consistency, and updates normalize_anomaly_score to enforce prediction consistency using dtype-aware nextafter logic.
quadra/tasks/anomaly.py Builds and passes EvalThreshold during report generation so normalized outputs align with the evaluation decision boundary.
quadra/init.py Bumps package version to 2.8.1.
pyproject.toml Bumps package version to 2.8.1 and updates anomalib-orobix dependency to 0.7.0.dev151.
poetry.lock Updates lockfile for anomalib-orobix 0.7.0.dev151 and content hash.
CHANGELOG.md Adds 2.8.1 release notes describing the fixes and dependency update.
Comments suppressed due to low confidence (2)

quadra/utils/anomaly.py:88

  • The docstring says no hard-coded epsilon is required, but the implementation still uses a fixed epsilon = 1e-3 and boundary - epsilon to define below_boundary. Either remove the fixed epsilon and rely purely on nextafter for a dtype-aware strict-inequality bound, or document why the extra 1e-3 margin is required (e.g., if downstream logic rounds to 3 decimals) and make the rounding/precision assumption explicit/parameterized.
    epsilon = 1e-3
    if isinstance(normalized_score, torch.Tensor):
        device = normalized_score.device
        # Work in scores dtype, cast boundaries to the same dype to ensure that casts take effect
        _inf = torch.tensor(float("inf"), dtype=normalized_score.dtype, device=device)
        boundary_tensor = torch.tensor(boundary, dtype=normalized_score.dtype, device=device)
        anomaly_boundary = boundary_tensor.clone()
        # If dtype cast causes anomaly_boundary to be smaller than normalized boundary (float),
        # increase it up to the next representable value
        if float(anomaly_boundary) < boundary:
            anomaly_boundary = torch.nextafter(anomaly_boundary, _inf)
        # Ensure consistency after rouding to 3 decimal places
        below_boundary = torch.min(torch.nextafter(boundary_tensor, -_inf), boundary_tensor - epsilon)

quadra/utils/anomaly.py:110

  • In the NumPy/scalar branch, dtype is taken from normalized_score.dtype only when it's an np.ndarray; NumPy scalar types (e.g. np.float16) will fall back to np.float64, which defeats the stated dtype-aware behavior and can change the output type. Consider deriving dtype via np.asarray(normalized_score).dtype (and similarly for raw_score) so scalar NumPy inputs stay dtype-consistent.
    elif isinstance(normalized_score, np.ndarray) or np.isscalar(normalized_score):
        # Work in scores dtype, cast boundaries to the same dype to ensure that casts take effect
        dtype = normalized_score.dtype if isinstance(normalized_score, np.ndarray) else np.float64
        _inf = np.array(np.inf, dtype=dtype)
        boundary_array = np.array(boundary, dtype=dtype)
        anomaly_boundary = boundary_array.copy()
        # If dtype cast causes anomaly_boundary to be smaller than normalized boundary (float),
        # increase it up to the next representable value
        if float(anomaly_boundary) < boundary:
            anomaly_boundary = np.nextafter(anomaly_boundary, _inf)
        # Ensure consistency after rouding to 3 decimal places
        below_boundary = np.minimum(np.nextafter(boundary_array, -_inf), boundary_array - epsilon)


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 64 to 70
score = raw_score
if isinstance(score, torch.Tensor):
score = score.cpu().numpy()
# Anomalib classify as anomaly if anomaly_score gte threshold
is_anomaly_mask = score >= threshold

boundary = eval_threshold.normalized
is_anomaly_mask = score >= eval_threshold.raw
is_not_anomaly_mask = np.bitwise_not(is_anomaly_mask)
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensure_scores_consistency always converts raw_score to a NumPy array and builds is_anomaly_mask/is_not_anomaly_mask as NumPy booleans. In the torch.Tensor branch those NumPy masks are then used to index a Torch tensor, which will break for CUDA tensors (and can also be problematic on CPU). Build the masks as Torch boolean tensors on the same device when normalized_score is a Torch tensor (e.g., compare raw_score in torch or convert the mask to torch.bool on normalized_score.device).

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@lorenzomammana lorenzomammana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crazy

@michele-milesi michele-milesi merged commit 411dc4d into main Feb 23, 2026
5 checks passed
@lorenzomammana lorenzomammana deleted the fix/anomaly-scores-normalization-consistency branch February 23, 2026 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants