Skip to content

PreprocessTrainingData.py and runtraining.sh

haberlmatt edited this page May 1, 2020 · 5 revisions

Task:

Two main scripts to generate a trained model

Scripts:

PreprocessTrainingData.py /images /labels /augmented

Generates augmented data which is used for training

runtraining.sh --numiterations 50000 /augmented /trainednet

Performs the actual training (~1h per 1000 iterations of each model on a modern GPU).

Input arguments:

PreprocessTrainingData.py

  • a folder with sequential training images (.png or .tif files; they can be 8-bit or 16-bit)
  • a folder with corresponding labels (8-bit images with zeros for background and 1 for object labels)
  • the folder where to write the augmented data

runtraining.sh

  • optinal: number of iterations for training (e.g. --numiterations 50000)
  • the with the augmented data (from PreprocessTrainingData.py)
  • output folder where the trained model is written

Optional Settings:

Augmentations: We group the augmentations into three types

  • Primary augmentations: (= flipping, rotations) do not change the image, and are always performed.
  • Secondary augmentations: alter noise, contrast, brightness of images
  • Tertiary augmentations: alter image size, and object sizes

Secondary and tertiary augmentations can be controlled in their strength (secondary: -1 to 10 and tertiary: 0 to 10). Example: PreprocessTrainingData.py /images /labels/ -1 3 /augmented

This means secondary augmentation = -1, which is a standard denoising built into CDeep3M2 tertiary augmentation = 3, which means moderate resizing operations are performed.

Training with more than one dataset:

To use multiple training datasets, pass the folders sequentially into PreprocessTrainingData.py, and the last argument is the output path for the augmented dataset.

Example Command:

PreprocessTrainingData.py /images1 /labels1 /images2 /labels2 /augmented

You can define how secondary and tertiary augmentations are performed for euch training dataset individually.

E.G.

PreprocessTrainingData.py /images1 /labels1 -1 0 /images2 /labels2 2 5 /augmented

This will use the settings (secondary: -1 and tertiary: 0) augmentations for image set 1 and augmentation settings (secondary: 2 and tertiary: 5) for images set 2.

After the preprocessing the runprediction.sh is used (as normally) directly on the augmented folder. runtraining.sh --numiterations 50000 /augmented /trainednet

Output directory:

ls /trainednet

1fm 3fm 5fm VERSION parallel.jobs readme.txt train_file.txt valid_file.txt

ls /trainednet/1fm deploy.prototxt label_class_selection.prototxt log solver.prototxt train_val.prototxt trainedmodel

Content

trainedmodel 1fm_classifer_iter_60000.caffemodel 1fm_classifer_iter_60000.solverstate

Snapshots (usually every 2000 iterations) of the trained model, to perform predictions and to continue training from this solverstate.

log accuracy.pdf loss.pdf out.log out.log.test out.log.train

Log files tracking the loss and accuracy over the training iterations. To generate accuracy.pdf and loss.pdf, run PlotValidation.py with the log folder as input argument

Clone this wiki locally