Skip to content

Why is the batch_size set so large in dataset.py, and set to a smaller value and it won't read in the data #2

@LaJarjan

Description

@LaJarjan

processed_dataset['train'] = dataset['train'].map(
preprocess_function_test,
batched=True,
batch_size=100000,
num_proc=1,
remove_columns=dataset["train"].column_names,
load_from_cache_file=False,
desc="Running tokenizer on dataset",
keep_in_memory=True,
)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions