feat: enhance image size handling across the codebase #167

CuriousDolphin · 2025-12-10T11:41:44Z

Updated im_size attribute in ModelInfo to support both int and tuple formats for image dimensions.
Introduced parse_im_size function in CLI to handle various input formats for image size, including string representations.
Modified training, validation, and export commands to accept and process image sizes as either int (square) or tuple (height, width).
Ensured backward compatibility by maintaining int representation for existing functionality.
Adjusted relevant processors and augmentations to accommodate the new image size format.

Note

Adds full support for non-square image sizes with CLI parsing, data/augmentation/mappers/processors updates, inference/export benchmarking changes, and docs/examples; also updates AutoDataset.get_split to accept DatasetAugmentations.

Image size handling (core):
- Make ModelInfo.im_size accept int | (H, W); propagate through TrainerArgs, processors, mappers, and ports.
- Update processors (DETR, RTMO, MaskFormer, BisenetFormer, Classification) to accept tuple sizes and resize accordingly.
- Adjust inference runtimes (ONNX, TorchScript, base model) warmup/benchmark to support non-square and log proper size; set latency im_size using height.
CLI:
- Add parse_im_size to accept "640", "640,480", or "640x480".
- Change train, val, and export --im-size option to string, parse to int | (H, W) and pass through.
Data pipeline:
- DatasetAugmentations.resolution now int | (H, W); non-square uses direct Resize, square keeps ResizeShortestEdge; adjust crop to use absolute size.
- get_default_by_task and all call sites now pass/return DatasetAugmentations (not raw list).
- AutoDataset.get_split(augs=...) now expects DatasetAugmentations and forwards both augs and resolution; remove .get_augmentations() usage across code/tests/tutorials.
- Mappers carry resolution; MapDataset exposes resolution property.
Export/Training:
- export_command and model export accept tuple sizes; set model_info.im_size to the provided resolution.
- After training, always reload best model/info and set model/processor to eval.
Docs:
- Update README and docs (CLI, concepts, inference, training) with non-square examples and usage notes.

^{Written by Cursor Bugbot for commit f535d24. This will update automatically on new commits. Configure here.}

- Updated `im_size` attribute in `ModelInfo` to support both int and tuple formats for image dimensions. - Introduced `parse_im_size` function in CLI to handle various input formats for image size, including string representations. - Modified training, validation, and export commands to accept and process image sizes as either int (square) or tuple (height, width). - Ensured backward compatibility by maintaining int representation for existing functionality. - Adjusted relevant processors and augmentations to accommodate the new image size format.

github-actions · 2025-12-10T11:43:18Z

Tests	Skipped	Failures	Errors	Time
228	0 💤	0 ❌	0 🔥	43.305s ⏱️

CuriousDolphin · 2025-12-10T13:51:34Z

bugbot run

cursor · 2025-12-10T14:14:41Z

🚨 Bugbot couldn't run

Something went wrong. Try again by commenting "Cursor review" or "bugbot run", or contact support (requestId: serverGenReqId_7ec461e1-234c-4911-b428-7920d04a168a).

- Updated `orjson` version from `3.10.18` to `3.11.5`. - Added additional `tensorrt` dependencies: `tensorrt-cu12`, `tensorrt-cu12-bindings`, and `tensorrt-cu12-libs`. - Updated `onnxruntime-gpu` and `onnxruntime` versions from `1.22.0` to `1.23.2`. - Updated `optimum[onnxruntime]` version from `1.27.0` to `2.0.0`.

cursor

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Bug: Method fails to handle tuple image size format

The end2end_benchmark method wasn't updated to handle the new tuple format for im_size. When self.model_info.im_size is a tuple (like (640, 480)), the code at line 737 calling torch.randn(1, 3, size, size) will fail with a TypeError because a tuple cannot be used as a dimension size. Additionally, line 735 creates a malformed log message and line 757 passes a tuple to LatencyMetrics.im_size which expects an int. The nearby benchmark method was correctly updated with the tuple handling pattern, but this method was missed.

focoos/models/focoos_model.py#L710-L761

focoos/focoos/models/focoos_model.py

Lines 710 to 761 in 9a719da

    
               def end2end_benchmark(self, iterations: int = 50, size: Optional[int] = None) -> LatencyMetrics: 
        
                   """Benchmark the complete end-to-end inference pipeline. 
        
                   This method measures the full inference latency including preprocessing, 
        
                   model forward pass, and postprocessing steps. 
        
                   Args: 
        
                       iterations: Number of iterations to run for benchmarking. 
        
                       size: Input image size. If None, uses model's default size. 
        
                       device: Device to run benchmarking on ("cuda" or "cpu"). 
        
                   Returns: 
        
                       LatencyMetrics containing end-to-end performance statistics. 
        
                   """ 
        
                   if size is None: 
        
                       size = self.model_info.im_size 
        
                   if self.model.device.type == "cpu": 
        
                       device_name = get_cpu_name() 
        
                   else: 
        
                       device_name = get_device_name() 
        
                   try: 
        
                       model = self.model.cuda() 
        
                   except Exception: 
        
                       logger.warning("Unable to use CUDA") 
        
                   logger.info(f"⏱️ Benchmarking End-to-End latency on {device_name} ({self.model.device}), size: {size}x{size}..") 
        
                   # warmup 
        
                   data = 128 * torch.randn(1, 3, size, size).to(model.device) 
        
                   durations = [] 
        
                   for _ in range(iterations): 
        
                       start = torch.cuda.Event(enable_timing=True) 
        
                       end = torch.cuda.Event(enable_timing=True) 
        
                       start.record(stream=torch.cuda.Stream()) 
        
                       _ = self(data) 
        
                       end.record(stream=torch.cuda.Stream()) 
        
                       torch.cuda.synchronize() 
        
                       durations.append(start.elapsed_time(end)) 
        
                   durations = np.array(durations) 
        
                   metrics = LatencyMetrics( 
        
                       fps=int(1000 / durations.mean()), 
        
                       engine=f"torch.{self.model.device}", 
        
                       mean=round(durations.mean().astype(float), 3), 
        
                       max=round(durations.max().astype(float), 3), 
        
                       min=round(durations.min().astype(float), 3), 
        
                       std=round(durations.std().astype(float), 3), 
        
                       im_size=size, 
        
                       device=str(self.model.device), 
        
                   ) 
        
                   logger.info(f"🔥 FPS: {metrics.fps} Mean latency: {metrics.mean} ms ") 
        
                   return metrics

- Introduced an optional `resolution` parameter across various dataset mappers including `AutoDataset`, `ClassificationDatasetMapper`, `DetectionDatasetMapper`, `KeypointDatasetMapper`, and `SemanticDatasetMapper`. - Updated the `MapDataset` class to expose the `resolution` property. - Enhanced the handling of image resolution in the `FocoosModel` to utilize the new `resolution` parameter from augmentations if available.

cursor

Bug: Resolution parameter not passed to get_split causing im_size not set

The resolution parameter is not being passed to auto_dataset.get_split() calls, despite the new parameter being added to the function signature. The code in focoos_model.py now tries to read resolution from the dataset mapper via getattr(data_train, "resolution", None), but since resolution is never passed to get_split(), the mapper's resolution attribute will always be None. This means the user-specified im_size won't be stored in model_info.im_size for trained models. The resolution value (from im_size or train_augs.resolution) needs to be passed to get_split().

focoos/cli/commands/train.py#L345-L347

focoos/focoos/cli/commands/train.py

Lines 345 to 347 in ea5fb05

    
           train_augs, val_augs = get_default_by_task(model.task, resolution=im_size or model.model_info.im_size) 
        
           train_dataset = auto_dataset.get_split(augs=train_augs.get_augmentations(), split=DatasetSplitType.TRAIN) 
        
           valid_dataset = auto_dataset.get_split(augs=val_augs.get_augmentations(), split=DatasetSplitType.VAL)

focoos/cli/commands/val.py#L320-L321

focoos/focoos/cli/commands/val.py

Lines 320 to 321 in ea5fb05

    
           _, val_augs = get_default_by_task(task=model.model_info.task, resolution=im_size or model.model_info.im_size) 
        
           valid_dataset = auto_dataset.get_split(augs=val_augs.get_augmentations(), split=DatasetSplitType.VAL)

- Modified the dataset split methods across various files to directly use the `DatasetAugmentations` object instead of calling `get_augmentations()`. - Updated relevant documentation and examples to reflect the new usage pattern for augmentations. - Ensured consistency in how augmentations are applied in training and validation processes.

- Included `shapely` version `2.1.2` in the project dependencies to support geometric operations.

- Changed the `im_size` attribute in `RemoteModelInfo` to accept both `int` and `Tuple[int, int]` for more flexible image size handling. - Updated related documentation to reflect the new type definition.

focoos/models/focoos_model.py

cursor

Bug: `end2end_benchmark` fails when model has tuple `im_size`

The FocoosModel.end2end_benchmark method accepts size: Optional[int] and falls back to self.model_info.im_size when None. However, im_size can now be a tuple (e.g., (640, 480)). The method then uses size directly in torch.randn(1, 3, size, size) which will raise a TypeError because tuples cannot be used as dimension arguments this way. Additionally, the log message f"size: {size}x{size}" will produce malformed output for tuple sizes. This method was not updated to handle tuple sizes like the other benchmark methods in the codebase that now properly normalize size to a tuple format.

focoos/models/focoos_model.py#L718-L745

focoos/focoos/models/focoos_model.py

Lines 718 to 745 in f535d24

    
               def end2end_benchmark(self, iterations: int = 50, size: Optional[int] = None) -> LatencyMetrics: 
        
                   """Benchmark the complete end-to-end inference pipeline. 
        
                   This method measures the full inference latency including preprocessing, 
        
                   model forward pass, and postprocessing steps. 
        
                   Args: 
        
                       iterations: Number of iterations to run for benchmarking. 
        
                       size: Input image size. If None, uses model's default size. 
        
                       device: Device to run benchmarking on ("cuda" or "cpu"). 
        
                   Returns: 
        
                       LatencyMetrics containing end-to-end performance statistics. 
        
                   """ 
        
                   if size is None: 
        
                       size = self.model_info.im_size 
        
                   if self.model.device.type == "cpu": 
        
                       device_name = get_cpu_name() 
        
                   else: 
        
                       device_name = get_device_name() 
        
                   try: 
        
                       model = self.model.cuda() 
        
                   except Exception: 
        
                       logger.warning("Unable to use CUDA") 
        
                   logger.info(f"⏱️ Benchmarking End-to-End latency on {device_name} ({self.model.device}), size: {size}x{size}..") 
        
                   # warmup 
        
                   data = 128 * torch.randn(1, 3, size, size).to(model.device)

Co-authored-by: ivan.murabito <ivan.murabito@focoos.ai>

cursor · 2025-12-11T16:02:24Z

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
_{Learn more about Cursor Agents}

Co-authored-by: ivan.murabito <ivan.murabito@focoos.ai>

cursor · 2025-12-11T16:05:38Z

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
_{Learn more about Cursor Agents}

CuriousDolphin linked an issue Dec 11, 2025 that may be closed by this pull request

[models][export] add tuple im_size instead of single value (square) for do infer with rectangular images #140

Closed

cursor bot reviewed Dec 11, 2025

View reviewed changes

CuriousDolphin linked an issue Dec 11, 2025 that may be closed by this pull request

[Training][Augmentation][Hub] Resolution is not correctly set on the model_info and on the Hub #116

Closed

CuriousDolphin added 3 commits December 11, 2025 14:46

chore: add shapely dependency to pyproject.toml

04fe628

- Included `shapely` version `2.1.2` in the project dependencies to support geometric operations.

refactor: update image size type in RemoteModelInfo

f535d24

- Changed the `im_size` attribute in `RemoteModelInfo` to accept both `int` and `Tuple[int, int]` for more flexible image size handling. - Updated related documentation to reflect the new type definition.

CuriousDolphin marked this pull request as ready for review December 11, 2025 15:46

CuriousDolphin self-assigned this Dec 11, 2025

cursor bot reviewed Dec 11, 2025

View reviewed changes

focoos/models/focoos_model.py Outdated Show resolved Hide resolved

cursor bot reviewed Dec 11, 2025

View reviewed changes

Refactor: Pass im_size to ProcessorManager

14367f2

Co-authored-by: ivan.murabito <ivan.murabito@focoos.ai>

feat: Allow non-square image sizes for end-to-end benchmark

915e7ad

Co-authored-by: ivan.murabito <ivan.murabito@focoos.ai>

CuriousDolphin merged commit 912a74b into main Dec 11, 2025
10 checks passed

CuriousDolphin deleted the feat/non-square-resolution branch December 11, 2025 16:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: enhance image size handling across the codebase #167

feat: enhance image size handling across the codebase #167

Uh oh!

CuriousDolphin commented Dec 10, 2025 •

edited by cursor bot

Loading

Uh oh!

github-actions bot commented Dec 10, 2025 •

edited

Loading

Uh oh!

CuriousDolphin commented Dec 10, 2025

Uh oh!

cursor bot commented Dec 10, 2025

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot commented Dec 11, 2025

Uh oh!

cursor bot commented Dec 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


	def end2end_benchmark(self, iterations: int = 50, size: Optional[int] = None) -> LatencyMetrics:
	"""Benchmark the complete end-to-end inference pipeline.

	This method measures the full inference latency including preprocessing,
	model forward pass, and postprocessing steps.

	Args:
	iterations: Number of iterations to run for benchmarking.
	size: Input image size. If None, uses model's default size.
	device: Device to run benchmarking on ("cuda" or "cpu").

	Returns:
	LatencyMetrics containing end-to-end performance statistics.
	"""
	if size is None:
	size = self.model_info.im_size
	if self.model.device.type == "cpu":
	device_name = get_cpu_name()
	else:
	device_name = get_device_name()
	try:
	model = self.model.cuda()
	except Exception:
	logger.warning("Unable to use CUDA")
	logger.info(f"⏱️ Benchmarking End-to-End latency on {device_name} ({self.model.device}), size: {size}x{size}..")
	# warmup
	data = 128 * torch.randn(1, 3, size, size).to(model.device)

	durations = []
	for _ in range(iterations):
	start = torch.cuda.Event(enable_timing=True)
	end = torch.cuda.Event(enable_timing=True)
	start.record(stream=torch.cuda.Stream())
	_ = self(data)
	end.record(stream=torch.cuda.Stream())
	torch.cuda.synchronize()
	durations.append(start.elapsed_time(end))

	durations = np.array(durations)
	metrics = LatencyMetrics(
	fps=int(1000 / durations.mean()),
	engine=f"torch.{self.model.device}",
	mean=round(durations.mean().astype(float), 3),
	max=round(durations.max().astype(float), 3),
	min=round(durations.min().astype(float), 3),
	std=round(durations.std().astype(float), 3),
	im_size=size,
	device=str(self.model.device),
	)
	logger.info(f"🔥 FPS: {metrics.fps} Mean latency: {metrics.mean} ms ")
	return metrics

	train_augs, val_augs = get_default_by_task(model.task, resolution=im_size or model.model_info.im_size)
	train_dataset = auto_dataset.get_split(augs=train_augs.get_augmentations(), split=DatasetSplitType.TRAIN)
	valid_dataset = auto_dataset.get_split(augs=val_augs.get_augmentations(), split=DatasetSplitType.VAL)

	_, val_augs = get_default_by_task(task=model.model_info.task, resolution=im_size or model.model_info.im_size)
	valid_dataset = auto_dataset.get_split(augs=val_augs.get_augmentations(), split=DatasetSplitType.VAL)

feat: enhance image size handling across the codebase #167

feat: enhance image size handling across the codebase #167

Uh oh!

Conversation

CuriousDolphin commented Dec 10, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CuriousDolphin commented Dec 10, 2025

Uh oh!

cursor bot commented Dec 10, 2025

🚨 Bugbot couldn't run

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

This PR is being reviewed by Cursor Bugbot

Bug: Method fails to handle tuple image size format

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Bug: Resolution parameter not passed to get_split causing im_size not set

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Bug: `end2end_benchmark` fails when model has tuple `im_size`

Uh oh!

cursor bot commented Dec 11, 2025

Uh oh!

cursor bot commented Dec 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CuriousDolphin commented Dec 10, 2025 •

edited by cursor bot

Loading

github-actions bot commented Dec 10, 2025 •

edited

Loading