main.evaluate() gives incorrect results when validation images are not all the same size. 

```
# Bug Report: `evaluate()` ignores `config.validation.size`

## Description

The `evaluate()` method does not use `config.validation.size` even when it is set, causing a RuntimeError when images have different dimensions. The size parameter must be passed explicitly to `evaluate()`, which is inconsistent with how other methods respect config settings.

## Expected Behavior

When `config.validation.size` is set, the `evaluate()` method should use this value by default (similar to how training and validation dataloaders use it).

## Actual Behavior

`evaluate()` ignores `config.validation.size` and requires the `size` parameter to be passed explicitly. This causes the following error when evaluating on images with different dimensions:

```
RuntimeError: Images in batch have different dimensions. Set validation.size in config.yaml to resize all images to a common size.
```

## Steps to Reproduce

```python
from deepforest import main

m = main.deepforest()
m.load_model("weecology/deepforest-tree")

# Set validation size in config (as documented)
m.config["validation"]["size"] = 800

# This fails even though config.validation.size is set
results = m.evaluate(csv_file="test.csv", root_dir="./data")
# RuntimeError: Images in batch have different dimensions...
```

## Current Workaround

Pass `size` explicitly to `evaluate()`:

```python
# Must pass size explicitly (redundant with config setting)
results = m.evaluate(csv_file="test.csv", root_dir="./data", size=800)
```

## Minimal Reproducible Example

See `reproduce_evaluate_size_issue.py` for a complete working example.

## Suggested Fix

Modify the `evaluate()` method in `main.py` to use `config.validation.size` as default when `size` parameter is `None`:

```python
def evaluate(self, csv_file, iou_threshold=None, root_dir=None, size=None, batch_size=None, predictions=None):
    # ...
    if size is None:
        size = self.config.validation.size
    
    if predictions is None:
        predictions = self.predict_file(csv_file, root_dir, size=size, batch_size=batch_size)
    # ...
```

## Environment

- DeepForest version: [current main branch]
- Python version: 3.12
- OS: Linux

## Additional Context

This issue affects training scripts that set `config.validation.size` and then call `evaluate()` for pre/post training evaluation. The inconsistency makes the config system less intuitive.

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

main.evaluate() gives incorrect results when validation images are not all the same size. #1238

Current Workaround

Minimal Reproducible Example

Suggested Fix

Environment

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

main.evaluate() gives incorrect results when validation images are not all the same size. #1238

Description

Current Workaround

Minimal Reproducible Example

Suggested Fix

Environment

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions