Skip to content

Evaluation reports spuriously low recall if limit_batches is set #1232

@jveitchmichaelis

Description

@jveitchmichaelis

Was doing some test runs with a limited batch size and the results look much better than the box recall suggests (< 0.03, which is low, even for a bad model). Evaluate loads the ground truth CSV file and doesn't use the dataloader, so it will run over the whole val dataset and assume missing predictions for images that were skipped. (I suspect this anyway).

This would probably fix it:

        if self.trainer.limit_val_batches != 1.0 and self.val_dataloader() != []:
            val_loader = self.val_dataloader()
            valid_image_paths = set()
            for batch in val_loader:
                _, _, image_names = batch
                valid_image_paths.update(image_names)
            ground_df = ground_df[ground_df.image_path.isin(valid_image_paths)]

but seems wasteful to do a full loop over the dataloader.

This is useful for easily testing the effect of dataset size on model performance, but we would probably want to generate specific subset train and val files to be more rigorous.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions