-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Hi Team,
I tried to train a Brainlm model by running:
python train.py
--output_dir /data/users2/ytao/BrainLM_YeTao/output_fake
--train_dataset_path /data/users2/ytao/BrainLM_YeTao/datasets_fake/train_ukbiobank
--val_dataset_path /data/users2/ytao/BrainLM_YeTao/datasets_fake/val_ukbiobank
--coords_dataset_path /data/users2/ytao/BrainLM_YeTao/datasets_fake/Brain_Region_Coordinates
--moving_window_len 20
--num_last_timepoints_masked 4
--hidden_size 128
--num_hidden_layers 4
--num_attention_heads 4
--intermediate_size 512
--decoder_hidden_size 128
--decoder_num_hidden_layers 2
--decoder_num_attention_heads 4
--decoder_intermediate_size 512
--attention_probs_dropout_prob 0.1
--per_device_train_batch_size 32
--per_device_eval_batch_size 32
--gradient_accumulation_steps 8
--save_total_limit 50
--dataloader_num_workers 3
--dataloader_pin_memory True
--wandb_logging True
--dataloader_drop_last True
Then I got the following error. If I changed per_device_train_batch_size and per_device_eval_batch_size to be 1, then I can run the code successfully. May I know the reason?
File "/data/users2/ytao/BrainLM_YeTao/train.py", line 702, in
main()
File "/data/users2/ytao/BrainLM_YeTao/train.py", line 672, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/data/users2/ytao/bin/miniconda3/envs/brainlm/lib/python3.10/site-packages/transformers/trainer.py", line 1662, in train
return inner_training_loop(
File "/data/users2/ytao/bin/miniconda3/envs/brainlm/lib/python3.10/site-packages/transformers/trainer.py", line 1899, in _inner_training_loop
for step, inputs in enumerate(epoch_iterator):
File "/data/users2/ytao/bin/miniconda3/envs/brainlm/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 701, in next
data = self._next_data()
File "/data/users2/ytao/bin/miniconda3/envs/brainlm/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1465, in _next_data
return self._process_data(data)
File "/data/users2/ytao/bin/miniconda3/envs/brainlm/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1491, in _process_data
data.reraise()
File "/data/users2/ytao/bin/miniconda3/envs/brainlm/lib/python3.10/site-packages/torch/_utils.py", line 715, in reraise
raise exception
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/data/users2/ytao/bin/miniconda3/envs/brainlm/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 351, in _worker_loop
data = fetcher.fetch(index) # type: ignore[possibly-undefined]
File "/data/users2/ytao/bin/miniconda3/envs/brainlm/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 50, in fetch
data = self.dataset.getitems(possibly_batched_index)
File "/data/users2/ytao/bin/miniconda3/envs/brainlm/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2809, in getitems
return [{col: array[i] for col, array in batch.items()} for i in range(n_examples)]
File "/data/users2/ytao/bin/miniconda3/envs/brainlm/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2809, in
return [{col: array[i] for col, array in batch.items()} for i in range(n_examples)]
File "/data/users2/ytao/bin/miniconda3/envs/brainlm/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2809, in
return [{col: array[i] for col, array in batch.items()} for i in range(n_examples)]
IndexError: list index out of range
Thanks,
Ye