Skip to content

Fix(preprocessing): duplicated labels#67

Merged
hollandjg merged 30 commits intomainfrom
fix--duplicated-labels
May 22, 2025
Merged

Fix(preprocessing): duplicated labels#67
hollandjg merged 30 commits intomainfrom
fix--duplicated-labels

Conversation

@hollandjg
Copy link
Copy Markdown
Member

@hollandjg hollandjg commented Jan 28, 2025

Ensure that floes detected at different scales don't have the same label.

Changes:

  • Avoid integer overflow by using cv2.connectedComponentsWithStats, which has a 32-bit integer type for tracking labels
  • Ensure that floes from iteration n+1 are labelled starting from the value of the previous highest label highest_label_so_far + 1

Resolutions:

Copy link
Copy Markdown
Member

@cpaniaguam cpaniaguam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few observations.

logger.debug("output after this iteration\n %s" % count_blobs_per_label(output).query("count > 1"))

# saving the props table
output = opening(output)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hollandjg
Copy link
Copy Markdown
Member Author

hollandjg commented Jan 29, 2025

We still get some cases, like this one, where there's a disconnected part of the floe (yellow) which has nothing to do with the other floes (greenish) nearby:

download-1

... or this one:

download-3

(where you can see a disconnected component near the concave bit on the western side of the floe)

@hollandjg
Copy link
Copy Markdown
Member Author

hollandjg commented Feb 12, 2025

I've some new examples with the updated code. It turns out that the opening operation itself causes some of these cases to arise.

Example: a floe which has multiple "lobes" straight out of the FSD algorithm prior to cleaning

In this case, we need some kind of cleaning.

The input image with the floe marked (I can't see anything in the true-color image):
download-8
download-9

How it appears after the FSD algorithm:
download

With opening:
download-1

With opening then cleaning:
download-2

With cleaning alone:
download-3

Example: a floe which has an extra "lobe" once opening is applied

The input image with the floe marked (I can't see anything in the true-color image):
download-10
download-11

How it appears after the FSD algorithm:
download-4

with opening:
download-5

with opening then cleaning:
download-6

with cleaning alone:
download-7

Proposal: include "new" cleaning, check in paper whether opening is a core part of the algorithm and decide what to do with it

@hollandjg
Copy link
Copy Markdown
Member Author

From Buckley et. al. https://doi.org/10.5194/tc-18-5031-2024, Figure 2
Screenshot 2025-02-13 at 11 22 33

I think the question is whether we want that smoothing step to be applied to all the floes at the end. Given Carlos' comment #66 (comment) it might be inadvisable to leave as-is.

If the smoothing is necessary from a scientific point of view, we could:

  • Apply opening to each floe individually, but then we may have single pixels where the opening makes it look like it's part of two or more floes. We'd could apply the opening at each floe scale, so that floes detected at smaller scales wouldn't be allowed to chomp bits out of larger floes.
  • Get the contour surrounding each floe from the raw (unsmoothed) pixels, and then smooth that contour. There would be some "collisions" between those contours for sure, but the scale of those would hopefully be smaller than the precision we have on the floe edge position.

@danielmwatkins @cpaniaguam Any other ideas? Or preferences?

@hollandjg
Copy link
Copy Markdown
Member Author

hollandjg commented Feb 13, 2025

Note that the failing test is unrelated to this PR, and is fixed in

@hollandjg hollandjg marked this pull request as ready for review March 7, 2025 14:57
@hollandjg hollandjg requested a review from ellenbuckley March 7, 2025 15:04
@hollandjg
Copy link
Copy Markdown
Member Author

I'd like to suggest keeping the "opening" step at the end of the algorithm, and just using the new "cleaning" to remove the extra disconnected parts. That's the smallest change we can make to the algorithm whilst still fixing the problem.
@danielmwatkins @cpaniaguam @ellenbuckley

@danielmwatkins
Copy link
Copy Markdown
Member

This is an interesting case because it is one, like you note, where there really isn't anything there that it should be identifying. I like the idea of adding a minimal cleaning step at the end.

@hollandjg hollandjg requested a review from cpaniaguam March 25, 2025 11:04
Copy link
Copy Markdown
Member

@cpaniaguam cpaniaguam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

Because the number of different labels in the images might be high, it's probably better to avoid the accumulator pattern in favor of a comprehension. Left a suggestion regarding this.

@hollandjg hollandjg requested a review from cpaniaguam March 26, 2025 15:09
@hollandjg hollandjg dismissed cpaniaguam’s stale review May 7, 2025 15:11

Looked into this. Decided against the requested changes.

@hollandjg hollandjg merged commit 7d2cc7e into main May 22, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: floes at different scales can obliterate each other, leaving disconnected chunks fix: add min- and max-floe size limits to preprocessing

3 participants