Skip to content

TFRecordsImageDataset#3

Merged
kmkolasinski merged 20 commits intomainfrom
feature-classification-dataset
Nov 1, 2025
Merged

TFRecordsImageDataset#3
kmkolasinski merged 20 commits intomainfrom
feature-classification-dataset

Conversation

@kmkolasinski
Copy link
Owner

@kmkolasinski kmkolasinski commented Nov 1, 2025

For efficient batch processing of image classification datasets, you can use the TFRecordsImageDataset class which provides:

  • Multi-threaded data loading
  • Batch processing with configurable batch size
  • Image preprocessing (resizing)
  • Shuffling and prefetching
  • File interleaving for better data distribution
from tfr_reader.datasets import TFRecordsImageDataset
from tqdm import tqdm

tfrecord_paths = ["/path/to/train.tfrecord"]

dataset = TFRecordsImageDataset(
    tfrecord_paths=tfrecord_paths,
    input_size=(320, 320),  # (height, width)
    batch_size=128,
    num_threads=6,
    shuffle=True,
    interleave_files=True,
    repeat=-1,
    prefetch=2,
)

# Iterate through the dataset
for images, labels in tqdm(dataset, total=len(dataset) // 128):
    # images: numpy array of shape (batch_size, height, width, channels)
    # labels: numpy array of shape (batch_size,)
    pass

@kmkolasinski kmkolasinski marked this pull request as ready for review November 1, 2025 10:37
@kmkolasinski kmkolasinski merged commit 9bbf079 into main Nov 1, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant