Skip to content

Conversation

@CuriousDolphin
Copy link
Member

@CuriousDolphin CuriousDolphin commented Dec 10, 2025

  • Updated im_size attribute in ModelInfo to support both int and tuple formats for image dimensions.
  • Introduced parse_im_size function in CLI to handle various input formats for image size, including string representations.
  • Modified training, validation, and export commands to accept and process image sizes as either int (square) or tuple (height, width).
  • Ensured backward compatibility by maintaining int representation for existing functionality.
  • Adjusted relevant processors and augmentations to accommodate the new image size format.

Note

Adds full support for non-square image sizes with CLI parsing, data/augmentation/mappers/processors updates, inference/export benchmarking changes, and docs/examples; also updates AutoDataset.get_split to accept DatasetAugmentations.

  • Image size handling (core):
    • Make ModelInfo.im_size accept int | (H, W); propagate through TrainerArgs, processors, mappers, and ports.
    • Update processors (DETR, RTMO, MaskFormer, BisenetFormer, Classification) to accept tuple sizes and resize accordingly.
    • Adjust inference runtimes (ONNX, TorchScript, base model) warmup/benchmark to support non-square and log proper size; set latency im_size using height.
  • CLI:
    • Add parse_im_size to accept "640", "640,480", or "640x480".
    • Change train, val, and export --im-size option to string, parse to int | (H, W) and pass through.
  • Data pipeline:
    • DatasetAugmentations.resolution now int | (H, W); non-square uses direct Resize, square keeps ResizeShortestEdge; adjust crop to use absolute size.
    • get_default_by_task and all call sites now pass/return DatasetAugmentations (not raw list).
    • AutoDataset.get_split(augs=...) now expects DatasetAugmentations and forwards both augs and resolution; remove .get_augmentations() usage across code/tests/tutorials.
    • Mappers carry resolution; MapDataset exposes resolution property.
  • Export/Training:
    • export_command and model export accept tuple sizes; set model_info.im_size to the provided resolution.
    • After training, always reload best model/info and set model/processor to eval.
  • Docs:
    • Update README and docs (CLI, concepts, inference, training) with non-square examples and usage notes.

Written by Cursor Bugbot for commit f535d24. This will update automatically on new commits. Configure here.

- Updated `im_size` attribute in `ModelInfo` to support both int and tuple formats for image dimensions.
- Introduced `parse_im_size` function in CLI to handle various input formats for image size, including string representations.
- Modified training, validation, and export commands to accept and process image sizes as either int (square) or tuple (height, width).
- Ensured backward compatibility by maintaining int representation for existing functionality.
- Adjusted relevant processors and augmentations to accommodate the new image size format.
@github-actions
Copy link

github-actions bot commented Dec 10, 2025

Coverage

Tests Skipped Failures Errors Time
228 0 💤 0 ❌ 0 🔥 43.305s ⏱️

@CuriousDolphin
Copy link
Member Author

bugbot run

@cursor
Copy link

cursor bot commented Dec 10, 2025

🚨 Bugbot couldn't run

Something went wrong. Try again by commenting "Cursor review" or "bugbot run", or contact support (requestId: serverGenReqId_7ec461e1-234c-4911-b428-7920d04a168a).

- Updated `orjson` version from `3.10.18` to `3.11.5`.
- Added additional `tensorrt` dependencies: `tensorrt-cu12`, `tensorrt-cu12-bindings`, and `tensorrt-cu12-libs`.
- Updated `onnxruntime-gpu` and `onnxruntime` versions from `1.22.0` to `1.23.2`.
- Updated `optimum[onnxruntime]` version from `1.27.0` to `2.0.0`.
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.


Bug: Method fails to handle tuple image size format

The end2end_benchmark method wasn't updated to handle the new tuple format for im_size. When self.model_info.im_size is a tuple (like (640, 480)), the code at line 737 calling torch.randn(1, 3, size, size) will fail with a TypeError because a tuple cannot be used as a dimension size. Additionally, line 735 creates a malformed log message and line 757 passes a tuple to LatencyMetrics.im_size which expects an int. The nearby benchmark method was correctly updated with the tuple handling pattern, but this method was missed.

focoos/models/focoos_model.py#L710-L761

def end2end_benchmark(self, iterations: int = 50, size: Optional[int] = None) -> LatencyMetrics:
"""Benchmark the complete end-to-end inference pipeline.
This method measures the full inference latency including preprocessing,
model forward pass, and postprocessing steps.
Args:
iterations: Number of iterations to run for benchmarking.
size: Input image size. If None, uses model's default size.
device: Device to run benchmarking on ("cuda" or "cpu").
Returns:
LatencyMetrics containing end-to-end performance statistics.
"""
if size is None:
size = self.model_info.im_size
if self.model.device.type == "cpu":
device_name = get_cpu_name()
else:
device_name = get_device_name()
try:
model = self.model.cuda()
except Exception:
logger.warning("Unable to use CUDA")
logger.info(f"⏱️ Benchmarking End-to-End latency on {device_name} ({self.model.device}), size: {size}x{size}..")
# warmup
data = 128 * torch.randn(1, 3, size, size).to(model.device)
durations = []
for _ in range(iterations):
start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)
start.record(stream=torch.cuda.Stream())
_ = self(data)
end.record(stream=torch.cuda.Stream())
torch.cuda.synchronize()
durations.append(start.elapsed_time(end))
durations = np.array(durations)
metrics = LatencyMetrics(
fps=int(1000 / durations.mean()),
engine=f"torch.{self.model.device}",
mean=round(durations.mean().astype(float), 3),
max=round(durations.max().astype(float), 3),
min=round(durations.min().astype(float), 3),
std=round(durations.std().astype(float), 3),
im_size=size,
device=str(self.model.device),
)
logger.info(f"🔥 FPS: {metrics.fps} Mean latency: {metrics.mean} ms ")
return metrics

Fix in Cursor Fix in Web


- Introduced an optional `resolution` parameter across various dataset mappers including `AutoDataset`, `ClassificationDatasetMapper`, `DetectionDatasetMapper`, `KeypointDatasetMapper`, and `SemanticDatasetMapper`.
- Updated the `MapDataset` class to expose the `resolution` property.
- Enhanced the handling of image resolution in the `FocoosModel` to utilize the new `resolution` parameter from augmentations if available.
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Resolution parameter not passed to get_split causing im_size not set

The resolution parameter is not being passed to auto_dataset.get_split() calls, despite the new parameter being added to the function signature. The code in focoos_model.py now tries to read resolution from the dataset mapper via getattr(data_train, "resolution", None), but since resolution is never passed to get_split(), the mapper's resolution attribute will always be None. This means the user-specified im_size won't be stored in model_info.im_size for trained models. The resolution value (from im_size or train_augs.resolution) needs to be passed to get_split().

focoos/cli/commands/train.py#L345-L347

train_augs, val_augs = get_default_by_task(model.task, resolution=im_size or model.model_info.im_size)
train_dataset = auto_dataset.get_split(augs=train_augs.get_augmentations(), split=DatasetSplitType.TRAIN)
valid_dataset = auto_dataset.get_split(augs=val_augs.get_augmentations(), split=DatasetSplitType.VAL)

focoos/cli/commands/val.py#L320-L321

_, val_augs = get_default_by_task(task=model.model_info.task, resolution=im_size or model.model_info.im_size)
valid_dataset = auto_dataset.get_split(augs=val_augs.get_augmentations(), split=DatasetSplitType.VAL)

Fix in Cursor Fix in Web


- Modified the dataset split methods across various files to directly use the `DatasetAugmentations` object instead of calling `get_augmentations()`.
- Updated relevant documentation and examples to reflect the new usage pattern for augmentations.
- Ensured consistency in how augmentations are applied in training and validation processes.
- Included `shapely` version `2.1.2` in the project dependencies to support geometric operations.
- Changed the `im_size` attribute in `RemoteModelInfo` to accept both `int` and `Tuple[int, int]` for more flexible image size handling.
- Updated related documentation to reflect the new type definition.
@CuriousDolphin CuriousDolphin marked this pull request as ready for review December 11, 2025 15:46
@CuriousDolphin CuriousDolphin self-assigned this Dec 11, 2025
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: `end2end_benchmark` fails when model has tuple `im_size`

The FocoosModel.end2end_benchmark method accepts size: Optional[int] and falls back to self.model_info.im_size when None. However, im_size can now be a tuple (e.g., (640, 480)). The method then uses size directly in torch.randn(1, 3, size, size) which will raise a TypeError because tuples cannot be used as dimension arguments this way. Additionally, the log message f"size: {size}x{size}" will produce malformed output for tuple sizes. This method was not updated to handle tuple sizes like the other benchmark methods in the codebase that now properly normalize size to a tuple format.

focoos/models/focoos_model.py#L718-L745

def end2end_benchmark(self, iterations: int = 50, size: Optional[int] = None) -> LatencyMetrics:
"""Benchmark the complete end-to-end inference pipeline.
This method measures the full inference latency including preprocessing,
model forward pass, and postprocessing steps.
Args:
iterations: Number of iterations to run for benchmarking.
size: Input image size. If None, uses model's default size.
device: Device to run benchmarking on ("cuda" or "cpu").
Returns:
LatencyMetrics containing end-to-end performance statistics.
"""
if size is None:
size = self.model_info.im_size
if self.model.device.type == "cpu":
device_name = get_cpu_name()
else:
device_name = get_device_name()
try:
model = self.model.cuda()
except Exception:
logger.warning("Unable to use CUDA")
logger.info(f"⏱️ Benchmarking End-to-End latency on {device_name} ({self.model.device}), size: {size}x{size}..")
# warmup
data = 128 * torch.randn(1, 3, size, size).to(model.device)

Fix in Cursor Fix in Web


Co-authored-by: ivan.murabito <ivan.murabito@focoos.ai>
@cursor
Copy link

cursor bot commented Dec 11, 2025

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

Co-authored-by: ivan.murabito <ivan.murabito@focoos.ai>
@cursor
Copy link

cursor bot commented Dec 11, 2025

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

@CuriousDolphin CuriousDolphin merged commit 912a74b into main Dec 11, 2025
10 checks passed
@CuriousDolphin CuriousDolphin deleted the feat/non-square-resolution branch December 11, 2025 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants