Skip to content

Correlate two profiles of striation marks#98

Open
laurensWe wants to merge 46 commits intomainfrom
feature/migrate_ProfileCorrelateSingle.m
Open

Correlate two profiles of striation marks#98
laurensWe wants to merge 46 commits intomainfrom
feature/migrate_ProfileCorrelateSingle.m

Conversation

@laurensWe
Copy link
Member

@laurensWe laurensWe commented Jan 23, 2026

Migrates ProfileCorrelateSingle.m from MATLAB to Python, implementing a profile correlation algorithm for comparing striated marks on bullets.

Key changes:

  • New profile_correlator module with clean separation of concerns:
    • correlator.py - Main correlate_profiles() function using brute-force alignment
    • data_types.py - Profile, AlignmentParameters, ComparisonResults dataclasses
    • statistics.py - Roughness metrics (Sa, Sq), signature differences, overlap ratio
    • transforms.py - Pixel equalization and scaling using scipy interpolation
  • Algorithm: Uses brute-force grid search over shifts and scales (instead of MATLAB's fminsearchbnd 2D optimization). Tries all shift positions with 7 scale factors (±5% range), selecting maximum
    correlation.
  • Minimum overlap: 350 μm (enforced with np.ceil to prevent rounding issues)
  • Simplified data model: Profile contains only depth_data (1D array) and pixel_size

Test plan

  • Unit tests for correlator, statistics, and transforms modules
  • Synthetic profile tests (shifted, scaled, partial, flipped)
  • Regression tests with 17 real-like profile pairs

laurensWe and others added 25 commits January 20, 2026 13:44
…zation

The brute-force correlator now uses the same logic for all profile lengths,
so there's no need to distinguish between partial and full profile matching.

- Remove is_partial_profile and partial_start_position from ComparisonResults
- Update visualization to always use position_shift for alignment
- Simplify test assertions to not check partial profile flag
- Delete old correlate_profiles_old output directory
- Update test images with new visualization format

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add generate_matlab_visualizations.py script to generate images from
  MATLAB test data in resources/profile_correlator/
- Fix NaN handling in plot_correlation_result using nanmin/nanmax
- Generate 20 test case visualizations in outputs/matlab_test_cases/

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use each profile's own pixel size for x-axis calculation instead of
  assuming both profiles have the same pixel size
- Only apply partial profile offset logic when pixel sizes match

This fixes the different_sampling visualization where ref (3.5 μm/pixel)
and comp (5.0 μm/pixel) have different physical lengths.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@laurensWe
Copy link
Member Author

Comparison with MATLAB

This implementation deliberately simplifies the MATLAB ProfileCorrelateSingle.m
algorithm for maintainability (~300 lines vs 3000+ lines).

Intentional Simplifications (divergence from MATLAB):

  1. Global search: MATLAB uses multi-scale coarse-to-fine search with bounded
    ranges at each level. This implementation searches all positions globally,
    which can find different (sometimes better) alignments for repetitive patterns.

  2. No Nelder-Mead optimization: MATLAB uses fminsearch for sub-sample
    precision. This implementation uses discrete sample shifts only.

  3. No low-pass filtering: MATLAB filters profiles at each scale level.
    This implementation operates on the original profiles.

  4. Discrete scale factors: Instead of continuous optimization, we try
    a fixed set of scale factors (e.g., 0.95, 0.97, ..., 1.05).



def correlate_profiles(
profile_ref: Profile,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please write out full names

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Laurens and Peter think that names are self-explanatory enough

@github-actions
Copy link

github-actions bot commented Feb 6, 2026

Diff Coverage

Diff: origin/main..HEAD, staged and unstaged changes

  • packages/scratch-core/src/conversion/preprocess_impression/preprocess_impression.py (100%)
  • packages/scratch-core/src/conversion/profile_correlator/init.py (100%)
  • packages/scratch-core/src/conversion/profile_correlator/correlator.py (96.2%): Missing lines 112,174,189
  • packages/scratch-core/src/conversion/profile_correlator/data_types.py (98.0%): Missing lines 42
  • packages/scratch-core/src/conversion/profile_correlator/statistics.py (96.9%): Missing lines 108
  • packages/scratch-core/src/conversion/profile_correlator/transforms.py (87.5%): Missing lines 45,56
  • packages/scratch-core/src/conversion/resample.py (100%)

Summary

  • Total: 196 lines
  • Missing: 7 lines
  • Coverage: 96%

packages/scratch-core/src/conversion/profile_correlator/correlator.py

Lines 108-116

  108     alignment = _find_best_alignment(
  109         heights_ref, heights_comp, scale_factors, min_overlap_samples
  110     )
  111     if alignment is None:
! 112         return None
  113 
  114     # Step 3: Compute and return metrics
  115     return _compute_metrics(alignment, pixel_size, len_ref, len_comp)

Lines 170-178

  170                 shift, len_comp, len_ref
  171             )  # Calculate overlap region for this shift
  172 
  173             if overlap_length < min_overlap_samples:
! 174                 continue
  175 
  176             partial_ref = heights_ref[idx_ref_start : idx_ref_start + overlap_length]
  177             partial_comp = heights_comp_scaled[
  178                 idx_comp_start : idx_comp_start + overlap_length

Lines 185-193

  185                 best_shift = shift
  186                 best_scale = scale
  187 
  188     if best_correlation is None:
! 189         return None
  190 
  191     # Redo computations for best_cale and best_shift (instead of copying partial_ref and partial_comp above multiple times. This saves time.)
  192     heights_comp_scaled = resample_array_1d(heights_comp, best_scale)
  193     len_comp = len(heights_comp_scaled)

packages/scratch-core/src/conversion/profile_correlator/data_types.py

Lines 38-46

  38         Get the number of samples in the profile.
  39 
  40         :returns: Number of samples in heights.
  41         """
! 42         return len(self.heights)
  43 
  44 
  45 @dataclass(frozen=True)
  46 class AlignmentParameters:

packages/scratch-core/src/conversion/profile_correlator/statistics.py

Lines 104-112

  104         or if overlap_length exceeds shorter_length (invalid input).
  105     """
  106     shorter_length = min(ref_length, comp_length)
  107     if np.isclose(shorter_length, 0.0):
! 108         return np.nan
  109     if overlap_length > shorter_length:
  110         return np.nan
  111     return 0.5 * (overlap_length / ref_length + overlap_length / comp_length)

packages/scratch-core/src/conversion/profile_correlator/transforms.py

Lines 41-49

  41     # Downsample the higher-resolution profile to match the lower-resolution one
  42     if pixel_1 > pixel_2:
  43         to_downsample, target_pixel_size = profile_2, profile_1.pixel_size
  44     else:
! 45         to_downsample, target_pixel_size = profile_1, profile_2.pixel_size
  46 
  47     factor = target_pixel_size / to_downsample.pixel_size
  48     downsampled = Profile(
  49         heights=resample_array_1d(to_downsample.heights, factor),

Lines 52-57

  52 
  53     if pixel_1 > pixel_2:
  54         return profile_1, downsampled
  55     else:
! 56         return downsampled, profile_2

@github-actions
Copy link

github-actions bot commented Feb 6, 2026

Code Coverage

Package Line Rate Branch Rate Health
. 96% 88%
comparators 100% 100%
computations 100% 100%
container_models 99% 100%
conversion 97% 86%
conversion.export 100% 100%
conversion.filter 92% 83%
conversion.leveling 100% 100%
conversion.leveling.solver 100% 75%
conversion.plots 98% 85%
conversion.preprocess_impression 99% 91%
conversion.preprocess_striation 89% 58%
conversion.profile_correlator 96% 80%
extractors 98% 75%
mutations 100% 100%
parsers 98% 80%
parsers.patches 89% 60%
preprocessors 95% 75%
processors 100% 100%
renders 98% 50%
utils 91% 75%
Summary 96% (2201 / 2281) 82% (237 / 290)

Minimum allowed line rate is 50%

shift: int, len_small: int, len_large: int
) -> tuple[int, int, int]:
"""
Find starting idx for both striations, and compute overlap length
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Find starting idx for both striations, and compute overlap length
Find starting index for both striations, and compute overlap length

idx_small_start = -shift
overlap_length = min(len_large, len_small + shift)

return idx_small_start, idx_large_start, overlap_length
Copy link
Collaborator

@cfs-data cfs-data Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand it was like this in MATLAB, so this is not a suggestion for this PR, but: shouldn't the overlap length be a variable as well (ranging from min(min_overlap_samples, maximum_overlap)? Now the comparison profile is cut either from the beginning or from the end, which allows for "small" overlapping regions with high correlation at the endings of the profiles, but not anywhere else. This seems arbitrary?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@laurensWe @vergep What do you think?

)

# make scaling symmetric to what you choose as ref or comp
scale_factors = np.unique(np.concatenate((scale_factors, 1 / scale_factors)))
Copy link
Collaborator

@cfs-data cfs-data Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments in code usually suggests you should perhaps use more functions:

def _get_scale_factors(max_scaling: float, n_steps: int) -> NDArray:
    """Generate scale factors."""
    scales = np.linspace(1.0 - max_scaling, 1.0 + max_scaling, n_steps)
    scales = np.unique(np.concatenate((scale_factors, 1 / scale_factors)))
    return scales


# Redo computations for best_cale and best_shift (instead of copying partial_ref and partial_comp above multiple times. This saves time.)
heights_comp_scaled = resample_array_1d(heights_comp, best_scale)
len_comp = len(heights_comp_scaled)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line can be removed


if best_correlation is None:
return None

Copy link
Collaborator

@cfs-data cfs-data Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could consider splitting the function here into two separte functions by returning the parameters found here, since now your function is doing more than one thing (which is generally undesirable)

Copy link
Collaborator

@cfs-data cfs-data left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

return result


def resample_array_2d(
Copy link
Collaborator

@cfs-data cfs-data Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should update every usage of this function in the codebase?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants