Skip to content

Conversation

@henrykironde
Copy link
Contributor

@henrykironde henrykironde commented Feb 1, 2026

This changes the prediction input pipeline to be less memory-hungry on large rasters.

  • Add helper to coerce inputs to RGB CHW float32 in [0, 1]
    • Validate ndim/channel placement and reject grayscale early
    • Normalize based on dtype (uint8 -> /255), not max/min heuristics
    • Ensure contiguous arrays before torch conversion
  • Refactor SingleImage to keep the full image in CHW without forcing a full float32 copy; convert/normalize per-window crops in get_crop
  • Improve TiledRaster/window strategy
    • Reuse a single rasterio dataset handle instead of reopening per window
    • Warn (don’t error) when the raster is untiled
    • Provide close() and ensure datasets are closed after predict
- `image = np.array(image)` often makes copies, Better to use `np.asarray(...)`
- `if image.dtype != "float32": ...` (bug),
	- `image.dtype` is a `numpy.dtype`, not the string `"float32"`
- `if image.max() > 1 or image.min() < 0: image /= 255` (fragile heuristic)
	- If it’s `uint8`, it *definitely* means 0255 → convert to float32 and divide by 255.
	- If it’s already float, you should only accept known conventions and error on surprising ranges.

Description

Related Issue(s)

AI-Assisted Development

  • I used AI tools (e.g., GitHub Copilot, ChatGPT, etc.) in developing this PR
  • I understand all the code I'm submitting
  • I have reviewed and validated all AI-generated code

AI tools used (if applicable):


Note

Medium Risk
Moderate risk because it changes image preprocessing/normalization behavior and rasterio handle lifecycle during inference, which can affect prediction inputs and resource cleanup on large rasters.

Overview
Reduces memory pressure during predict_tile by standardizing image coercion via _ensure_rgb_chw_float32 (strict RGB/shape validation, dtype-based normalization, contiguous arrays) and switching np.array(...) to np.asarray(...) to avoid unnecessary copies.

Improves dataloader_strategy='single' and 'window' paths: SingleImage keeps the full image in CHW without forcing a full float32 conversion, normalizing per-window in get_crop, while TiledRaster reuses a single rasterio dataset handle across windows, warns (instead of erroring) on untiled rasters, and adds close()/best-effort cleanup; predict_tile now calls ds.close() when available.

Written by Cursor Bugbot for commit bb6e8e0. This will update automatically on new commits. Configure here.

This changes the prediction input pipeline to be less
memory-hungry on large rasters.

- Add helper to coerce inputs to RGB CHW float32 in [0, 1]
  - Validate ndim/channel placement and reject grayscale early
  - Normalize based on dtype (uint8 -> /255), not max/min heuristics
  - Ensure contiguous arrays before torch conversion
- Refactor SingleImage to keep the full image in CHW without forcing a full
  float32 copy; convert/normalize per-window crops in get_crop
- Improve TiledRaster/window strategy
  - Reuse a single rasterio dataset handle instead of reopening per window
  - Warn (don’t error) when the raster is untiled
  - Provide close() and ensure datasets are closed after predict

```j
- `image = np.array(image)` often makes copies, Better to use `np.asarray(...)`
- `if image.dtype != "float32": ...` (bug),
	- `image.dtype` is a `numpy.dtype`, not the string `"float32"`
- `if image.max() > 1 or image.min() < 0: image /= 255` (fragile heuristic)
	- If it’s `uint8`, it *definitely* means 0–255 → convert to float32 and divide by 255.
	- If it’s already float, you should only accept known conventions and error on surprising ranges.
```
@codecov
Copy link

codecov bot commented Feb 1, 2026

Codecov Report

❌ Patch coverage is 69.11765% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.73%. Comparing base (3146b96) to head (bb6e8e0).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
src/deepforest/datasets/prediction.py 68.18% 21 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1293      +/-   ##
==========================================
- Coverage   87.89%   86.73%   -1.17%     
==========================================
  Files          20       20              
  Lines        2776     2827      +51     
==========================================
+ Hits         2440     2452      +12     
- Misses        336      375      +39     
Flag Coverage Δ
unittests 86.73% <69.11%> (-1.17%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@henrykironde henrykironde marked this pull request as ready for review February 2, 2026 04:46
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

This is the final PR Bugbot will review for you during this billing cycle

Your free Bugbot reviews will reset on March 2

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

if crop.dtype != "float32":
crop = crop.astype("float32")
if crop.max() > 1 or crop.min() < 0:
crop /= 255.0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fragile normalization heuristic used in crop processing

Medium Severity

SingleImage.get_crop uses a fragile normalization heuristic that the PR description explicitly identifies as problematic. When crop.min() < 0, it incorrectly divides by 255 instead of raising an error for invalid input. When crop.max() > 255, dividing by 255 produces values still greater than 1. The new _ensure_rgb_chw_float32 helper function handles these cases correctly by raising ValueError for out-of-range values, but get_crop uses the old logic, creating inconsistent behavior between code paths.

Fix in Cursor Fix in Web

crop = crop.astype("float32")
if crop.max() > 1 or crop.min() < 0:
crop /= 255.0
return torch.from_numpy(crop)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In-place normalization corrupts overlapping window crops

High Severity

When input is a float32 image in [0, 255] range, get_crop skips the astype call (since dtype is already float32), leaving crop as a view of self.image. The subsequent crop /= 255.0 modifies the underlying image array in-place. This corrupts overlapping window regions, causing them to be normalized multiple times (divided by 255 repeatedly), producing incorrect prediction results. The old code avoided this by always making a copy via np.array() and normalizing the full image once upfront.

Fix in Cursor Fix in Web

raise ValueError(
f"Expected 3 channel image, got image shape {image.shape}"
)
image = np.moveaxis(image, -1, 0)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated image loading and channel detection logic

Medium Severity

SingleImage.prepare_items duplicates image loading logic (lines 211-220) that exists in load_and_preprocess_image (lines 107-112), and duplicates channel detection/HWC-to-CHW conversion logic (lines 222-227) that exists in _ensure_rgb_chw_float32 (lines 36-41). Consider extracting a helper for just CHW conversion without normalization to share this logic while preserving the memory optimization intent.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant