-
Notifications
You must be signed in to change notification settings - Fork 237
wip: refactor evaluation for other geometries #1297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
fe2bead to
2803f92
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1297 +/- ##
==========================================
- Coverage 87.89% 87.33% -0.56%
==========================================
Files 20 21 +1
Lines 2776 2835 +59
==========================================
+ Hits 2440 2476 +36
- Misses 336 359 +23
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
b9fc2dd to
0a8df59
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Refactors evaluation to support multiple geometry types (box/polygon/point) via a unified matching + metric-aggregation pipeline, and updates tests/callers accordingly.
Changes:
- Replace box-specific evaluation with a generalized
evaluate_geometryflow and a geometry-agnostic matching helper. - Update IoU utilities (rename polygon matcher, add point matching) and adjust call sites/tests.
- Improve class-wise precision/recall computation and add/adjust tests for label conversion + point evaluation.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
src/deepforest/evaluate.py |
Introduces evaluate_geometry, refactors matching and class metric computation, and updates wrapper behavior for label normalization. |
src/deepforest/IoU.py |
Renames polygon IoU matcher and adds Hungarian point matching utilities. |
src/deepforest/main.py |
Updates internal evaluation call to explicitly pass geometry_type="box". |
tests/test_evaluate.py |
Updates tests to call evaluate_geometry and adds point + label-conversion coverage. |
tests/test_IoU.py |
Updates test to use the renamed polygon matching function. |
Comments suppressed due to low confidence (1)
src/deepforest/IoU.py:75
- Renaming
compute_IoUtomatch_polygonsremoves an established function name from theIoUmodule. If users callIoU.compute_IoU(...)externally, this is a breaking API change; consider keepingcompute_IoUas a thin deprecated alias tomatch_polygonsfor backwards compatibility.
def match_polygons(ground_truth: "gpd.GeoDataFrame", submission: "gpd.GeoDataFrame"):
"""Find area of overlap among all sets of ground truth and prediction.
This function performs matching between a ground truth dataset and a
submission or prediction dataset, typically the output from a validation or
test run. In order to compute IoU, we must know which boxes correspond
between the datasets. This is performed by Hungarian matching, or linear
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Martí Bosch martibosch@users.noreply.github.com
0a8df59 to
7dba0bf
Compare
Description
This PR cleans up the evaluation code and makes it more agnostic to different scoring tasks. The main idea is that we perform the same steps regardless of the underlying geometry:
We use Hungarian Matching for 3 (i.e.
linear_sum_assignment) which is fast for the scale of inputs that we work with and is agnostic to geometry provided we can define a pairwise matching cost. For boxes and polygons this is IoU, for points I've implemented a similar method that uses either the L1 or L2 norm.Changes:
I removed some duplication in evaluate.py and used some library methods for dataframe checks and conversion for consistency. The code for
evaluate_xis a single function which deals with different geometries as requested (point,box,polygon). The output returns<task>_precisionand<task>_recall; other fields are the same regardless of geometry.evaluate_boxes->evaluate_geometry. I don't see this low level function being called anywhere outside tests, and it's usually called by the wrapper function anyway.compute_class_recalldoesn't check for mixed label types and will do weird things if you pass inTree/ 0 (for example predictions with numeric ID and ground truth with string). The current test suite misses this as it only looks at the box_precision/recall values but I did some debugging and I think this issue affects main. This PR coerces all labels to be numeric (in__evaluate_wrapper__) and then processes based on the union of labels, as in Feat torchmetrics eval #1071. Do we want this behavior or do we want to only report classes present in ground truth?.Bug in the (class-wise) precision calculation; the numerator should count matches among predictions with the target label, existing code counts matches among ground truth .
Changes point evaluation to be point vs point, not point vs box. We can add this back, but not sure what we need here.
Draft until confirmed that eval results are unchanged and we include improved test cases for all three geometries.
AI-Assisted Development
Mainly hand-coded, but used Copilot for autocomplete + identifying the cause of a test failure at the end, also for review.
AI tools used (if applicable):
Copilot