Evaluation reports spuriously low recall if limit_batches is set

Was doing some test runs with a limited batch size and the results look much better than the box recall suggests (< 0.03, which is low, even for a bad model). Evaluate loads the ground truth CSV file and doesn't use the dataloader, so it will run over the whole val dataset and assume missing predictions for images that were skipped. (I suspect this anyway).

This would probably fix it:

```python
        if self.trainer.limit_val_batches != 1.0 and self.val_dataloader() != []:
            val_loader = self.val_dataloader()
            valid_image_paths = set()
            for batch in val_loader:
                _, _, image_names = batch
                valid_image_paths.update(image_names)
            ground_df = ground_df[ground_df.image_path.isin(valid_image_paths)]
```

but seems wasteful to do a full loop over the dataloader.

This is useful for easily testing the effect of dataset size on model performance, but we would probably want to generate specific subset train and val files to be more rigorous.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation reports spuriously low recall if limit_batches is set #1232

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Evaluation reports spuriously low recall if limit_batches is set #1232

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions