RuntimeError: stack expects each tensor to be equal size, but got [1, 150, 150, 150] at entry 0 and [1, 149, 150, 150] at entry 3

I am currently experimenting with training. I have had 2 successful training runs on small amounts of frames (3000+3000 from L4 mouse and 3000+2982 from L6 mouse). This worked and gave me models that I could use for correction. May next attempt, training on all 4 files from both layers, however, keeps failing with the following error:

```
$ python train.py --datasets_folder L4_L6 --datasets_path /gpfs/soma_fs/home/voit/bbo/projects/2gramfiberscope/experiments/denoising_training_data/ --n_epochs 40 --GPU 0,1,2,3 --batch_size 4 --train_datasets_size 300 --select_img_num 11400
srun: job 31197 queued and waiting for resources
srun: job 31197 has been allocated resources
Training parameters -----> 
Namespace(GPU='0,1,2,3', b1=0.5, b2=0.999, batch_size=4, datasets_folder='L4_L6', datasets_path='/gpfs/soma_fs/home/voit/bbo/projects/2gramfiberscope/experiments/denoising_training_data/', fmap=16, gap_h=75, gap_s=75, gap_w=75, img_h=150, img_s=150, img_w=150, lr=5e-05, n_epochs=40, ngpu=4, normalize_factor=1, output_dir='./results', pth_path='pth', select_img_num=11400, train_datasets_size=300)
Image list for training -----> 
Total number ----->  4
M210601JKL_20210702_D4_00002_xscancorr_rigidXCorr_export.tif
M210602JKL_20210702_D8_00006_xscancorr_rigidXCorr_export.tif
M210602JKL_20210702_D8_00003_xscancorr_rigidXCorr_export.tif
M210601JKL_20210702_D4_00005_xscancorr_rigidXCorr_export.tif
Using 4 GPU for training -----> 
Traceback (most recent call last):
  File "train.py", line 96, in <module>
    for iteration, (input, target) in enumerate(trainloader):
  File "/gpfs/soma_fs/home/voit/anaconda3/envs/deepcad/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 517, in __next__
    data = self._next_data()
  File "/gpfs/soma_fs/home/voit/anaconda3/envs/deepcad/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1179, in _next_data
    return self._process_data(data)
  File "/gpfs/soma_fs/home/voit/anaconda3/envs/deepcad/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
    data.reraise()
  File "/gpfs/soma_fs/home/voit/anaconda3/envs/deepcad/lib/python3.6/site-packages/torch/_utils.py", line 429, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 3.
Original Traceback (most recent call last):
  File "/gpfs/soma_fs/home/voit/anaconda3/envs/deepcad/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
    data = fetcher.fetch(index)
  File "/gpfs/soma_fs/home/voit/anaconda3/envs/deepcad/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
    return self.collate_fn(data)
  File "/gpfs/soma_fs/home/voit/anaconda3/envs/deepcad/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 83, in default_collate
    return [default_collate(samples) for samples in transposed]
  File "/gpfs/soma_fs/home/voit/anaconda3/envs/deepcad/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 83, in <listcomp>
    return [default_collate(samples) for samples in transposed]
  File "/gpfs/soma_fs/home/voit/anaconda3/envs/deepcad/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 55, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [1, 150, 150, 150] at entry 0 and [1, 149, 150, 150] at entry 2

```

I have checked that all files have the same spatial dimensions, in case that is important:
```
$ for i in *.tif; do echo $i;tiffinfo $i|head -7; done
TIFF Directory at offset 0x241f6 (147958)
  Image Width: 269 Image Length: 275
  Bits/Sample: 16
  Sample Format: unsigned integer
  Compression Scheme: None
  Photometric Interpretation: min-is-black
  Samples/Pixel: 1
M210601JKL_20210702_D4_00005_xscancorr_rigidXCorr_export.tif
TIFF Directory at offset 0x241f6 (147958)
  Image Width: 269 Image Length: 275
  Bits/Sample: 16
  Sample Format: unsigned integer
  Compression Scheme: None
  Photometric Interpretation: min-is-black
  Samples/Pixel: 1
M210602JKL_20210702_D8_00003_xscancorr_rigidXCorr_export.tif
TIFF Directory at offset 0x241f6 (147958)
  Image Width: 269 Image Length: 275
  Bits/Sample: 16
  Sample Format: unsigned integer
  Compression Scheme: None
  Photometric Interpretation: min-is-black
  Samples/Pixel: 1
M210602JKL_20210702_D8_00006_xscancorr_rigidXCorr_export.tif
TIFF Directory at offset 0x241f6 (147958)
  Image Width: 269 Image Length: 275
  Bits/Sample: 16
  Sample Format: unsigned integer
  Compression Scheme: None
  Photometric Interpretation: min-is-black
  Samples/Pixel: 1
```

Despite having read issue #2 , I am not yet sure if I interpret `--train_datasets_size` correctly. Is this the number or the size of the stacks? Do I have to make sure that `--select_img_num` is divisible by that number? I also find that the choice  `--train_datasets_size` massively influences the calculation time. What would be an expected sweet spot here?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RuntimeError: stack expects each tensor to be equal size, but got [1, 150, 150, 150] at entry 0 and [1, 149, 150, 150] at entry 3 #9

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

RuntimeError: stack expects each tensor to be equal size, but got [1, 150, 150, 150] at entry 0 and [1, 149, 150, 150] at entry 3 #9

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions