Skip to content

Median filter: incorrect border handling and slow u16/int16 path #16

@Marek55S

Description

@Marek55S

Summary

The median filter implementation in api/src/algorithms.rs computes the median using a fixed window size (2*r+1)^2 for every pixel and clears a large histogram (65,536 bins) per pixel for u16/int16 data. This leads to incorrect medians at image borders and severe performance problems on u16/int16 images.

Symptoms / Reproduction

  1. Apply the median filter with radius r > 0 to an image and inspect pixels near the image edges. Medians are frequently wrong (often 0).
  2. Run the filter on a large u16 image (e.g. 1024×1024) and observe very slow runtime and high CPU usage due to per-pixel clearing of a 65,536-bin histogram.

Actual behavior

  • The algorithm always uses window_size = (2*r+1)^2 to compute the median index, even at borders where fewer pixels are present in the kernel.
  • For u16/int16 images, the code clears a 65,536-bin histogram for every pixel (e.g., hist.fill(0)), which is extremely expensive.

Expected behavior

  • Median computation must use the actual number of pixels present in the kernel at image edges.
  • The u16/int16 path must avoid clearing the full 65,536-bin histogram per pixel and be much faster on large images.

Root cause

  • Border bug: median helpers receive a constant window_size rather than the actual number of pixels added to the histogram (total_count) for kernels overlapping image boundaries.
  • Performance: clearing the full u16 histogram per pixel is O(65536) work per pixel and dominates runtime.

Required changes

  1. Fix border correctness

    • While iterating the kernel, track the actual number of pixels added to the histogram (total_count).
    • Pass total_count into median_from_histogram_* instead of the fixed window_size.
    • Update median helper(s) to compute mid = (total_count + 1) / 2 and use that to select the median.
  2. Optimize u16/int16 performance

    • Stop calling hist.fill(0) for all 65,536 bins per pixel.
    • Suggested approaches (choose based on complexity & expected kernel sizes):
      • (a) Sliding histogram: Maintain a rolling histogram when moving the kernel horizontally so you only add/remove column pixels instead of rebuilding the whole histogram.
      • (b) Small-kernel fast path: For small kernels (area <= threshold), collect neighborhood values into a small Vec/array and use select_nth_unstable to find the median — avoids the huge histogram entirely.
      • (c) Hybrid: Use (b) for small kernels and (a) for large kernels.
  3. Optional: Parallelize per-row processing for native builds (e.g., rayon) behind a feature flag to avoid impacting wasm builds.

Implementation notes & contract

  • Keep the public function signature and behavior stable (input image buffer and radius, output same-type buffer).
  • For i16, convert to indices via an offset (e.g., index = (value as i32 + 32768) as usize) only inside the histogram logic.
  • For even total_count decide and document whether you pick the lower or upper median; existing behavior used mid = (window_size + 1) / 2 (upper median for even counts).

Acceptance criteria

  • Correct medians at image edges (unit tests pass for border cases).
  • u16/int16 median filtering on 1024×1024 images shows substantial speedup vs the current per-pixel full-histogram clear

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions