-
Notifications
You must be signed in to change notification settings - Fork 2
feat: enhance image size handling across the codebase #167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Updated `im_size` attribute in `ModelInfo` to support both int and tuple formats for image dimensions. - Introduced `parse_im_size` function in CLI to handle various input formats for image size, including string representations. - Modified training, validation, and export commands to accept and process image sizes as either int (square) or tuple (height, width). - Ensured backward compatibility by maintaining int representation for existing functionality. - Adjusted relevant processors and augmentations to accommodate the new image size format.
|
bugbot run |
🚨 Bugbot couldn't runSomething went wrong. Try again by commenting "Cursor review" or "bugbot run", or contact support (requestId: serverGenReqId_7ec461e1-234c-4911-b428-7920d04a168a). |
- Updated `orjson` version from `3.10.18` to `3.11.5`. - Added additional `tensorrt` dependencies: `tensorrt-cu12`, `tensorrt-cu12-bindings`, and `tensorrt-cu12-libs`. - Updated `onnxruntime-gpu` and `onnxruntime` versions from `1.22.0` to `1.23.2`. - Updated `optimum[onnxruntime]` version from `1.27.0` to `2.0.0`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is being reviewed by Cursor Bugbot
Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
Bug: Method fails to handle tuple image size format
The end2end_benchmark method wasn't updated to handle the new tuple format for im_size. When self.model_info.im_size is a tuple (like (640, 480)), the code at line 737 calling torch.randn(1, 3, size, size) will fail with a TypeError because a tuple cannot be used as a dimension size. Additionally, line 735 creates a malformed log message and line 757 passes a tuple to LatencyMetrics.im_size which expects an int. The nearby benchmark method was correctly updated with the tuple handling pattern, but this method was missed.
focoos/models/focoos_model.py#L710-L761
focoos/focoos/models/focoos_model.py
Lines 710 to 761 in 9a719da
| def end2end_benchmark(self, iterations: int = 50, size: Optional[int] = None) -> LatencyMetrics: | |
| """Benchmark the complete end-to-end inference pipeline. | |
| This method measures the full inference latency including preprocessing, | |
| model forward pass, and postprocessing steps. | |
| Args: | |
| iterations: Number of iterations to run for benchmarking. | |
| size: Input image size. If None, uses model's default size. | |
| device: Device to run benchmarking on ("cuda" or "cpu"). | |
| Returns: | |
| LatencyMetrics containing end-to-end performance statistics. | |
| """ | |
| if size is None: | |
| size = self.model_info.im_size | |
| if self.model.device.type == "cpu": | |
| device_name = get_cpu_name() | |
| else: | |
| device_name = get_device_name() | |
| try: | |
| model = self.model.cuda() | |
| except Exception: | |
| logger.warning("Unable to use CUDA") | |
| logger.info(f"⏱️ Benchmarking End-to-End latency on {device_name} ({self.model.device}), size: {size}x{size}..") | |
| # warmup | |
| data = 128 * torch.randn(1, 3, size, size).to(model.device) | |
| durations = [] | |
| for _ in range(iterations): | |
| start = torch.cuda.Event(enable_timing=True) | |
| end = torch.cuda.Event(enable_timing=True) | |
| start.record(stream=torch.cuda.Stream()) | |
| _ = self(data) | |
| end.record(stream=torch.cuda.Stream()) | |
| torch.cuda.synchronize() | |
| durations.append(start.elapsed_time(end)) | |
| durations = np.array(durations) | |
| metrics = LatencyMetrics( | |
| fps=int(1000 / durations.mean()), | |
| engine=f"torch.{self.model.device}", | |
| mean=round(durations.mean().astype(float), 3), | |
| max=round(durations.max().astype(float), 3), | |
| min=round(durations.min().astype(float), 3), | |
| std=round(durations.std().astype(float), 3), | |
| im_size=size, | |
| device=str(self.model.device), | |
| ) | |
| logger.info(f"🔥 FPS: {metrics.fps} Mean latency: {metrics.mean} ms ") | |
| return metrics |
- Introduced an optional `resolution` parameter across various dataset mappers including `AutoDataset`, `ClassificationDatasetMapper`, `DetectionDatasetMapper`, `KeypointDatasetMapper`, and `SemanticDatasetMapper`. - Updated the `MapDataset` class to expose the `resolution` property. - Enhanced the handling of image resolution in the `FocoosModel` to utilize the new `resolution` parameter from augmentations if available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Resolution parameter not passed to get_split causing im_size not set
The resolution parameter is not being passed to auto_dataset.get_split() calls, despite the new parameter being added to the function signature. The code in focoos_model.py now tries to read resolution from the dataset mapper via getattr(data_train, "resolution", None), but since resolution is never passed to get_split(), the mapper's resolution attribute will always be None. This means the user-specified im_size won't be stored in model_info.im_size for trained models. The resolution value (from im_size or train_augs.resolution) needs to be passed to get_split().
focoos/cli/commands/train.py#L345-L347
focoos/focoos/cli/commands/train.py
Lines 345 to 347 in ea5fb05
| train_augs, val_augs = get_default_by_task(model.task, resolution=im_size or model.model_info.im_size) | |
| train_dataset = auto_dataset.get_split(augs=train_augs.get_augmentations(), split=DatasetSplitType.TRAIN) | |
| valid_dataset = auto_dataset.get_split(augs=val_augs.get_augmentations(), split=DatasetSplitType.VAL) |
focoos/cli/commands/val.py#L320-L321
focoos/focoos/cli/commands/val.py
Lines 320 to 321 in ea5fb05
| _, val_augs = get_default_by_task(task=model.model_info.task, resolution=im_size or model.model_info.im_size) | |
| valid_dataset = auto_dataset.get_split(augs=val_augs.get_augmentations(), split=DatasetSplitType.VAL) |
- Modified the dataset split methods across various files to directly use the `DatasetAugmentations` object instead of calling `get_augmentations()`. - Updated relevant documentation and examples to reflect the new usage pattern for augmentations. - Ensured consistency in how augmentations are applied in training and validation processes.
- Included `shapely` version `2.1.2` in the project dependencies to support geometric operations.
- Changed the `im_size` attribute in `RemoteModelInfo` to accept both `int` and `Tuple[int, int]` for more flexible image size handling. - Updated related documentation to reflect the new type definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: `end2end_benchmark` fails when model has tuple `im_size`
The FocoosModel.end2end_benchmark method accepts size: Optional[int] and falls back to self.model_info.im_size when None. However, im_size can now be a tuple (e.g., (640, 480)). The method then uses size directly in torch.randn(1, 3, size, size) which will raise a TypeError because tuples cannot be used as dimension arguments this way. Additionally, the log message f"size: {size}x{size}" will produce malformed output for tuple sizes. This method was not updated to handle tuple sizes like the other benchmark methods in the codebase that now properly normalize size to a tuple format.
focoos/models/focoos_model.py#L718-L745
focoos/focoos/models/focoos_model.py
Lines 718 to 745 in f535d24
| def end2end_benchmark(self, iterations: int = 50, size: Optional[int] = None) -> LatencyMetrics: | |
| """Benchmark the complete end-to-end inference pipeline. | |
| This method measures the full inference latency including preprocessing, | |
| model forward pass, and postprocessing steps. | |
| Args: | |
| iterations: Number of iterations to run for benchmarking. | |
| size: Input image size. If None, uses model's default size. | |
| device: Device to run benchmarking on ("cuda" or "cpu"). | |
| Returns: | |
| LatencyMetrics containing end-to-end performance statistics. | |
| """ | |
| if size is None: | |
| size = self.model_info.im_size | |
| if self.model.device.type == "cpu": | |
| device_name = get_cpu_name() | |
| else: | |
| device_name = get_device_name() | |
| try: | |
| model = self.model.cuda() | |
| except Exception: | |
| logger.warning("Unable to use CUDA") | |
| logger.info(f"⏱️ Benchmarking End-to-End latency on {device_name} ({self.model.device}), size: {size}x{size}..") | |
| # warmup | |
| data = 128 * torch.randn(1, 3, size, size).to(model.device) |
Co-authored-by: ivan.murabito <ivan.murabito@focoos.ai>
|
Cursor Agent can help with this pull request. Just |
Co-authored-by: ivan.murabito <ivan.murabito@focoos.ai>
|
Cursor Agent can help with this pull request. Just |
im_sizeattribute inModelInfoto support both int and tuple formats for image dimensions.parse_im_sizefunction in CLI to handle various input formats for image size, including string representations.Note
Adds full support for non-square image sizes with CLI parsing, data/augmentation/mappers/processors updates, inference/export benchmarking changes, and docs/examples; also updates AutoDataset.get_split to accept DatasetAugmentations.
ModelInfo.im_sizeacceptint | (H, W); propagate throughTrainerArgs, processors, mappers, and ports.DETR,RTMO,MaskFormer,BisenetFormer,Classification) to accept tuple sizes and resize accordingly.im_sizeusing height.parse_im_sizeto accept"640","640,480", or"640x480".train,val, andexport--im-sizeoption to string, parse toint | (H, W)and pass through.DatasetAugmentations.resolutionnowint | (H, W); non-square uses directResize, square keepsResizeShortestEdge; adjust crop to use absolute size.get_default_by_taskand all call sites now pass/returnDatasetAugmentations(not raw list).AutoDataset.get_split(augs=...)now expectsDatasetAugmentationsand forwards both augs andresolution; remove.get_augmentations()usage across code/tests/tutorials.resolution;MapDatasetexposesresolutionproperty.export_commandand model export accept tuple sizes; setmodel_info.im_sizeto the provided resolution.Written by Cursor Bugbot for commit f535d24. This will update automatically on new commits. Configure here.