Fix/anomaly scores normalization consistency#156
Conversation
… and normalized scores
…nd 1e-3 for avoiding inconsistencies after rounding
There was a problem hiding this comment.
Pull request overview
This PR improves anomaly score normalization so that classification decisions remain consistent between raw scores and normalized scores/maps, especially around floating-point boundary conditions and when using a distinct evaluation threshold.
Changes:
- Added
EvalThresholdplusensure_scores_consistency, and extendednormalize_anomaly_scoreto optionally enforce consistency against the evaluation boundary. - Updated anomaly evaluation/report generation to pass an
EvalThresholdderived from the optimal evaluation threshold. - Expanded unit tests to cover fp16/fp32 and numpy/torch consistency edge cases; bumped anomalib-orobix to
0.7.0.dev151and released2.8.1.
Reviewed changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tests/utilities/test_anomaly_utils.py | Adds extensive tests for consistency enforcement and eval-threshold-aware normalization across numpy/torch and fp16/fp32. |
| quadra/utils/anomaly.py | Introduces EvalThreshold, ensure_scores_consistency, and updates normalize_anomaly_score to enforce prediction consistency using dtype-aware nextafter logic. |
| quadra/tasks/anomaly.py | Builds and passes EvalThreshold during report generation so normalized outputs align with the evaluation decision boundary. |
| quadra/init.py | Bumps package version to 2.8.1. |
| pyproject.toml | Bumps package version to 2.8.1 and updates anomalib-orobix dependency to 0.7.0.dev151. |
| poetry.lock | Updates lockfile for anomalib-orobix 0.7.0.dev151 and content hash. |
| CHANGELOG.md | Adds 2.8.1 release notes describing the fixes and dependency update. |
Comments suppressed due to low confidence (2)
quadra/utils/anomaly.py:88
- The docstring says no hard-coded epsilon is required, but the implementation still uses a fixed
epsilon = 1e-3andboundary - epsilonto definebelow_boundary. Either remove the fixed epsilon and rely purely onnextafterfor a dtype-aware strict-inequality bound, or document why the extra 1e-3 margin is required (e.g., if downstream logic rounds to 3 decimals) and make the rounding/precision assumption explicit/parameterized.
epsilon = 1e-3
if isinstance(normalized_score, torch.Tensor):
device = normalized_score.device
# Work in scores dtype, cast boundaries to the same dype to ensure that casts take effect
_inf = torch.tensor(float("inf"), dtype=normalized_score.dtype, device=device)
boundary_tensor = torch.tensor(boundary, dtype=normalized_score.dtype, device=device)
anomaly_boundary = boundary_tensor.clone()
# If dtype cast causes anomaly_boundary to be smaller than normalized boundary (float),
# increase it up to the next representable value
if float(anomaly_boundary) < boundary:
anomaly_boundary = torch.nextafter(anomaly_boundary, _inf)
# Ensure consistency after rouding to 3 decimal places
below_boundary = torch.min(torch.nextafter(boundary_tensor, -_inf), boundary_tensor - epsilon)
quadra/utils/anomaly.py:110
- In the NumPy/scalar branch, dtype is taken from
normalized_score.dtypeonly when it's annp.ndarray; NumPy scalar types (e.g.np.float16) will fall back tonp.float64, which defeats the stated dtype-aware behavior and can change the output type. Consider deriving dtype vianp.asarray(normalized_score).dtype(and similarly forraw_score) so scalar NumPy inputs stay dtype-consistent.
elif isinstance(normalized_score, np.ndarray) or np.isscalar(normalized_score):
# Work in scores dtype, cast boundaries to the same dype to ensure that casts take effect
dtype = normalized_score.dtype if isinstance(normalized_score, np.ndarray) else np.float64
_inf = np.array(np.inf, dtype=dtype)
boundary_array = np.array(boundary, dtype=dtype)
anomaly_boundary = boundary_array.copy()
# If dtype cast causes anomaly_boundary to be smaller than normalized boundary (float),
# increase it up to the next representable value
if float(anomaly_boundary) < boundary:
anomaly_boundary = np.nextafter(anomaly_boundary, _inf)
# Ensure consistency after rouding to 3 decimal places
below_boundary = np.minimum(np.nextafter(boundary_array, -_inf), boundary_array - epsilon)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| score = raw_score | ||
| if isinstance(score, torch.Tensor): | ||
| score = score.cpu().numpy() | ||
| # Anomalib classify as anomaly if anomaly_score gte threshold | ||
| is_anomaly_mask = score >= threshold | ||
|
|
||
| boundary = eval_threshold.normalized | ||
| is_anomaly_mask = score >= eval_threshold.raw | ||
| is_not_anomaly_mask = np.bitwise_not(is_anomaly_mask) |
There was a problem hiding this comment.
ensure_scores_consistency always converts raw_score to a NumPy array and builds is_anomaly_mask/is_not_anomaly_mask as NumPy booleans. In the torch.Tensor branch those NumPy masks are then used to index a Torch tensor, which will break for CUDA tensors (and can also be problematic on CPU). Build the masks as Torch boolean tensors on the same device when normalized_score is a Torch tensor (e.g., compare raw_score in torch or convert the mask to torch.bool on normalized_score.device).
Summary
Describe the purpose of the pull request, including:
Updated
Fixed
normalize_anomaly_scorenow accepts an optionaleval_threshold(EvalThreshold) parameter. When provided, consistency enforcement uses the actual evaluation boundary instead of always using the training threshold at 100.0, preventing misclassification of samples whose raw score falls close to the evaluation thresholds.np.nextafter/torch.nextafter(dtype-aware) instead of hardcoded epsilon values, eliminating ULP-gap misclassifications especially at low-precision (fp16) boundaries.AnomalibEvaluationnow builds anEvalThresholdfrom the optimal evaluation threshold and passes it tonormalize_anomaly_score, ensuring consistent predictions between raw and normalized anomaly scores and anomaly maps.Type of Change
Please select the one relevant option below:
Checklist
Please confirm that the following tasks have been completed: