Skip to content

A problem when the sample_size is not divisible by the batch_size #5

@MounirB

Description

@MounirB

I adapted DataGenerator to my Deep Learning pipeline.
When the sample size is not divisible by the batch_size, the DataGenerator seems to return to the first batch without taking into account the last (smaller) batch.

Example
Let A be an array of train samples, and batch_size = 4.
A = [4,7,8,7,9,78,8,4,78,51,6,5,1,0]. Here A.size = 14
It is clear, in this situation, that A.size is not divisible by batch_size.

The batches the DataGenerator yields during the training process are the following :

  • Batch_0 = [4,7,8,7],
  • Batch_1 = [9,78,8,4]
  • Batch_2 = [78,51,6,5]
  • Batch_3 = [4,7,8,7] This is where the problem lies. Instead of having Batch_3 = [1,0]. It goes back to the first batch

Here is a situation where an other generator behaves well when the sample_size is not divisible by the batch_size https://stackoverflow.com/questions/54159034/what-if-the-sample-size-is-not-divisible-by-batch-size-in-keras-model

For your information, I kept as is the following instruction
int(np.floor(len(self.list_IDs) / self.batch_size))
If I change np.floor to np.ceil, it seems to bug during the training/validation phases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions