Skip to content

Docker failed to train #25

@fgdfgfthgr-fox

Description

@fgdfgfthgr-fox

I installed the docker version and tries to run it locally. The example test/rundemo.sh works, but not the runtraining.sh.

# PreprocessTrainingData.py ../UroCell/Train ../UroCell/Train_label ../UroCell/Train_augmented
Starting Training Data Preprocessing
num_training_sets: 1
augmentation_level: ['-1']
thrid_augmentation_level: ['0']
Training Image Path: ../UroCell/Train
Training Label Path: ../UroCell/Train_label
Secondary Augmentation level: -1
Tertiary Augmentation level: 0
Output Path: ../UroCell/Train_augmented
Loading:
../UroCell/Train_label
Image importer loading ...
../UroCell/Train_label
Reading file: ../UroCell/Train_label/fib1-1-0-3.tif
Reading file: ../UroCell/Train_label/fib1-3-2-1.tif
Reading file: ../UroCell/Train_label/fib1-3-3-0.tif
Reading file: ../UroCell/Train_label/fib1-4-3-0.tif
(4, 256, 256, 1)
(4, 256, 256, 1)
Verifying labels
Running image enhancement
Running image enhancements
Processing 4 images
Running 32 parallel threads
Loading: ../UroCell/Train/fib1-1-0-3.tif -> Type: uint16
Loading: ../UroCell/Train/fib1-3-2-1.tif -> Type: uint16
Loading: ../UroCell/Train/fib1-3-3-0.tif -> Type: uint16
Saving: ../UroCell/Train_augmented/enhanced_v1/fib1-1-0-3.png
Loading: ../UroCell/Train/fib1-4-3-0.tif -> Type: uint16
Saving: ../UroCell/Train_augmented/enhanced_v1/fib1-3-2-1.png
Saving: ../UroCell/Train_augmented/enhanced_v1/fib1-3-3-0.png
Saving: ../UroCell/Train_augmented/enhanced_v1/fib1-4-3-0.png
Image enhancements completed
Enhanced images are stored in../UroCell/Train_augmented/enhanced_v1
Loading:
../UroCell/Train_augmented/enhanced_v1
Image importer loading ...
../UroCell/Train_augmented/enhanced_v1
Reading file: ../UroCell/Train_augmented/enhanced_v1/fib1-1-0-3.png
Reading file: ../UroCell/Train_augmented/enhanced_v1/fib1-3-2-1.png
Reading file: ../UroCell/Train_augmented/enhanced_v1/fib1-3-3-0.png
Reading file: ../UroCell/Train_augmented/enhanced_v1/fib1-4-3-0.png
(4, 256, 256, 1)
(4, 256, 256, 1)
Verifying images
Checking image dimensions
Augmenting training data 1-8 and 9-16

Applying tertiary augmentation to stack 1
Saving:  /home/UroCell/Train_augmented/training_full_stacks_v1_1.h5
(4, 325, 325, 1)

Applying tertiary augmentation to stack 9
Saving:  /home/UroCell/Train_augmented/training_full_stacks_v1_9.h5
(4, 325, 325, 1)

Applying tertiary augmentation to stack 2
Saving:  /home/UroCell/Train_augmented/training_full_stacks_v1_2.h5
(4, 325, 325, 1)

Applying tertiary augmentation to stack 10
Saving:  /home/UroCell/Train_augmented/training_full_stacks_v1_10.h5
(4, 325, 325, 1)

Applying tertiary augmentation to stack 3
Saving:  /home/UroCell/Train_augmented/training_full_stacks_v1_3.h5
(4, 325, 325, 1)

Applying tertiary augmentation to stack 11
Saving:  /home/UroCell/Train_augmented/training_full_stacks_v1_11.h5
(4, 325, 325, 1)

Applying tertiary augmentation to stack 4
Saving:  /home/UroCell/Train_augmented/training_full_stacks_v1_4.h5
(4, 325, 325, 1)

Applying tertiary augmentation to stack 12
Saving:  /home/UroCell/Train_augmented/training_full_stacks_v1_12.h5
(4, 325, 325, 1)

Applying tertiary augmentation to stack 5
Saving:  /home/UroCell/Train_augmented/training_full_stacks_v1_5.h5
(4, 325, 325, 1)

Applying tertiary augmentation to stack 13
Saving:  /home/UroCell/Train_augmented/training_full_stacks_v1_13.h5
(4, 325, 325, 1)

Applying tertiary augmentation to stack 6
Saving:  /home/UroCell/Train_augmented/training_full_stacks_v1_6.h5
(4, 325, 325, 1)

Applying tertiary augmentation to stack 14
Saving:  /home/UroCell/Train_augmented/training_full_stacks_v1_14.h5
(4, 325, 325, 1)

Applying tertiary augmentation to stack 7
Saving:  /home/UroCell/Train_augmented/training_full_stacks_v1_7.h5
(4, 325, 325, 1)

Applying tertiary augmentation to stack 15
Saving:  /home/UroCell/Train_augmented/training_full_stacks_v1_15.h5
(4, 325, 325, 1)

Applying tertiary augmentation to stack 8
Saving:  /home/UroCell/Train_augmented/training_full_stacks_v1_8.h5
(4, 325, 325, 1)

Applying tertiary augmentation to stack 16
Saving:  /home/UroCell/Train_augmented/training_full_stacks_v1_16.h5
(4, 325, 325, 1)

-> Training data augmentation completed
Training data stored in  ../UroCell/Train_augmented
For training your model please run runtraining.sh  ../UroCell/Train_augmented <desired output directory>

# nvidia-smi
Wed Jan  8 03:17:22 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.02              Driver Version: 566.03         CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3080        On  |   00000000:43:00.0  On |                  N/A |
| 30%   31C    P8              6W /  320W |     670MiB /  10240MiB |      2%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        42      G   /Xwayland                                   N/A      |
+-----------------------------------------------------------------------------------------+
# runtraining.sh  ../UroCell/Train_augmented  ../UroCell/TrainedNet
--
Running CDeep3M Version 2.1.5
../UroCell/Train_augmented
base_dir is:  /home/cdeep3m
Verifying input training data is valid ...
success
Copying over model files and creating run scripts ...
success


A new directory has been created: ../UroCell/TrainedNet
In this directory are 3 directories 1fm,3fm,5fm which
correspond to 3 caffe models that need to be trained
/home/cdeep3m/trainworker.sh: unrecognized option '--models 1fm,3fm,5fm '

Single GPU detected.
ERROR: caffe had a non zero exit code: 1
/home/cdeep3m/caffetrain.sh: line 166: ../UroCell/TrainedNet/1fm/log/out.log: No such file or directory
ERROR: caffe had a non zero exit code: 1
/home/cdeep3m/caffetrain.sh: line 166: ../UroCell/TrainedNet/3fm/log/out.log: No such file or directory
ERROR: caffe had a non zero exit code: 1
/home/cdeep3m/caffetrain.sh: line 166: ../UroCell/TrainedNet/5fm/log/out.log: No such file or directory
Non zero exit code from caffe for train of model. Exiting.
ERROR, a non-zero exit code (1) was received from: trainworker.sh --numiterations 30000

Also, it doesn't seem like it recognise the 3D nature of the dataset and is treating it as 2D images.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions