Introduction to the config file

The configuration file must be defined by the user, and its path is the only argument to be passed for training. An example can be found under ./configs/example_config.ini.

The file is required to test the various classes, including autoencoder, ViT, ConvNet, ConvNeXt.

The configuration file contains 12 subsections, which are listed below, with a general description.

Subsection	Description
MODEL	Model main parameters (architecture, batch size, etc)
AUGMENTATION	Adjustable parameters for data augmentation
CHECKPOINT	Rationale for model checkpoint (see `pl.callbacks.ModelCheckpoint`)
CRITERIA	WSI selection criteria (see `Dataloader.Dataloader.WSIQuery`)
DATA	Paths to access data, relative and absolute size of datasets
OPTIMIZER	Selection of optimizer and relevant parameters
NORMALIZATION	Perform Macenko colour normalisation
REGULARIZATION	Includes various regularization strategies
SCHEDULER	Selection of learning rate scheduler and adjustable parameters
VERBOSE	Printed details in the Python console (see `utils.GetInfo`)
OMERO	server credentials if accessing contour from OMERO
CONTOURS	OPTIONAL: Contour-specific parameters to generate patches

Detailed parameters

Each of the 9 subsections contain one or more parameters whose values must be set. Details on the parameters can be found in the comprehensive tables below.

The Options/restrictions column, if not empty, provides the currently available values that can be used for the parameter. If the Valid column is empty, the parameter is used for all architectures. Otherwise, the column returns the list of all architectures that use the parameter.

MODEL parameters

MODEL parameters	Description	Options/restrictions	Valid
Activation	Activation function for the classification head.	"Identity", for cost functions that include the activation, such as CrossEntropyLoss.
Backbone	Basic architecture for the ConvNet class.	Restricted to options in `torchvision.models`.	ConvNet
Base_Model	Class to use for training/inference. Will automatically set to lower case.	autoencoder ViT ConvNet ConvNeXt
Batch_Size	Batch size for training and inference.
Depth_ViT	Depth of the network (number of recursive blocks).		autoencoder, ViT
Drop_Rate	Probability of all Dropout layers in architecture.	If 0, no Dropout layers are used.
Emb_size	Size of the transformer patch embeddings.	Suggested to match Sub_Patch_Size_ViT²×n_channels.	ViT
Inference	Boolean for training or inference mode.	"true" for inference mode; "false" for training mode.
Layer_Scale	LayerScale initial value, as implemented in [1]		ConvNeXt
Loss_Function	Model loss function.	Restricted to options in `torch.nn`.
Max_Epochs	Maximum number of epochs
Model_Save_Path	Export directory of the model, based on the rationale used in CHECKPOINT.
N_Heads_ViT	Number of heads for multihead self-attention.		ViT
Precision	Precision for training. Try reducing if out of memory.	16 32 64
Pretrained	Boolean to use pre-trained Backbones.	"true" "false"	ConvNet
Random_Seed	For reproducibility, implemented with `pl.seed_everything`. See source code for which modules are seeded.
wf	Network parameter in the autoencoder.		autoencoder

AUGMENTATION parameters

AUGMENTATION parameters	Description	Options/restrictions	Valid
Rand_Magnitude	Integer setting `magnitude` in `transforms.RandAugment`.
Rand_Operations	Integer setting `num_ops` in `transforms.RandAugment`.

CHECKPOINT parameters

At each epoch, the model is saved if the quantity followed in "Monitor" has reached a new "Mode" compared to previous epochs. The quantity defined in "Monitor" must be logged during training/validation/testing using the pl.LightningModule.log() method. For instance, to export the model when it reaches a new high in validation accuracy, set the Mode to "max" and Monitor to "val_acc_epoch".

CHECKPOINT parameters	Description	Options/restrictions	Valid
Mode	Value of the metric used as a decision criteria.
Monitor	Metric to monitor

CRITERIA parameters

For details on how the CRITERIA parameters work, see Dataloader.Dataloader.WSIQuery. Each CRITERIA parameter corresponds to a column in the MasterSheet (defined above as a DATA parameter). Selection of WSIs for training/validation/testing is done on the subset of WSIs meeting the criteria. If the user does not want to use a specific CRITERIA parameter, it is preferable to comment its line in the config file. Each CRITERIA parameter should be encoded as a list of one or more strings.

CRITERIA parameters	Description	Options/restrictions	Valid
Diagnosis	List of acceptable diagnoses.
id	List of acceptable WSI ids.
Type	List of acceptable WSI types.

DATA parameters

DATA parameters	Description	Options/restrictions	Valid
Dim	Dimension of image patches (H, W).	Must be a list of one or more dimensions, e.g. [[256, 256]]
MasterSheet	Path of the .csv sheet used to select WSI based on CRITERIA parameters. See `Dataloader.Dataloader.WSIQuery`.
N_Classes	Number of classes in the classification head.
N_Per_Sample	Number of tiles to use per WSI for data sampling. See the Sampling_Scheme option to know how N_Per_Sample is used.
Patches_Folder	Path of the folder for .csv files including all tiles location and classification, for each WSI. See `TileDataset.sh` to generate such files.
Sampling_Scheme	Sampling scheme used to gather patches in each WSI. See `Dataloader.py` and `utils/sampling_schemes.py` for details on the implemented methods.	Current options: `wsi`, `patch` or a custom string that points to a custom function defined in the `utils.sampling_scheme` module. The first two options will sample `N_Per_Sample` patches per WSI. Data is then assigned to training/validation/test sets by splitting over WSIs or or patches, respectively. The latter can result in patches of the same WSI being used in training and validation sets.
Sub_Patch_Size_ViT	Dimension of sub-tiles for the transformer. Each tile is divided into sub-tiles of size Sub_Patch_Size_ViT for the attention mechanism.		ViT
SVS_Folder	Path of the folder containing all original WSI (.svs files)
Label_Name	Name of the column inside the .csv files in Patches_Folder that is used as a label for training.	"sarcoma_label" for sarcoma classification.
Train_Size	Fraction of dataset to be used for training. Splitting is done over WSIs, not individual tiles.
Val_Size	Fraction of dataset to be used for validation. Splitting is done over WSIs, not individual tiles. Fraction of test dataset is 1 - Train_Size - Val_Size.
Vis	Visibility level used. Can be a list of multiple visibility levels.	Must be a list of one or more scalars, e.g. [0].

OPTIMIZER parameters

OPTIMIZER parameters	Description	Options/restrictions
Algorithm	Optimization algorithm.	Restricted to options in `torch.optim`.
eps	Optimisation parameter in `Adam` and `AdamW`.
lr	Learning rate.

NORMALIZATION parameters

To perform colour normalisation as part of the initial image transforms before learning, it is recommended to fit a normalizer using QA/Normalization/train_Macenko.py. This will export a .pt file, whose path can be used as the Colour_Norm_File parameter defined below.

NORMALIZATION parameters	Description	Options/restrictions	Valid
Colour_Norm_File	Path pointing to a `.pt` file containing calibration parameters for a Macenko normalizer. Remove the field to use no normalization.	string

Macenko normalization is achieved with the use of the ColourNorm class in QA/Normalization/ColourNorm.py.

REGULARIZATION parameters

REGULARIZATION parameters	Description	Valid
Label_Smoothing	Label smoothing value for `torch.nn.CrossEntropyLoss`.
Stoch_Depth	Stochastic depth, as proposed by [2]	ConvNeXt
Weight_Decay	Weight decay, as a direct input to `torch.optim optimizers`.

SCHEDULER parameters

SCHEDULER parameters	Description	Options/restrictions	Valid
Lin_Gamma	Input of `torch.optim.lr_scheduler.stepLR`		If Type=='stepLR'
Lin_Step_Size	Input of `torch.optim.lr_scheduler.stepLR`		If Type=='stepLR'
Type	Type of scheduler.	'stepLR' 'cosine_warmup'
Cos_Warmup_Epochs	Number of warmup epochs for cosine scheduler (`transformers.optimization.get_cosine_schedule_with_warmup`)		If Type=='cosine_warmup'

VERBOSE parameters

VERBOSE parameters	Description	Options/restrictions	Valid
Data_Info	Boolean to provide additional information on the dataset after data splitting. See `utils.GetInfo`.

OMERO parameters

Will only be used if the Type parameter of CONTOURS is set to omero. Otherwise, leave all fields empty or false.

Parameter	Description	Options/restrictions	Example
Host	Address of the server.	string	`'128.0.0.0'`
User	Username to log into the server.	string	`'user'`
Pw	Password of the above user.	string	`'password'`
Target_Member	Name of the member owning the OMERO group from which you will download contours.	string	`'user'`
Target_Group	Name of the OMERO group from which you will download contours.	string	`'sarcoma study'`

CONTOURS parameters (optional)

Parameter	Description	Options/restrictions	Example
Type	String defining the type of contour to be used.	`'local'` for a local contours, `'omero'` to download contours on an OMERO server. If not using contours, leave empty `''` or `false`.	`'local'`
Specific_Contours	A list of strings providing contours that should be processed; unspecified contours will not be processed.	list of contours, or `false` to process all available contours.	`['Tumour', 'Fat']`
Remove_BG	Float used to remove background voxels using a hard thresholding method. For each tile that belongs to a contour listed in `Remove_BG_Contours`, the tile is first colour-normalised using Macenko's approach, and the number of pixels whose gray-scale colour is > `Remove_BG` * 255 is calculated. If more than 50% of pixels in the tile meet the condition, the tile is recognised as background and is not processed.	`float` between 0 and 1 to use, or set to `false`.	`0`
Remove_BG_Contours	A list of strings specifying the contours on which the above procedure (`Remove_BG`) is applied.	list of contours, or `false` to process all available contours. The contour names should be the original ones from Omero, not the proposed contours in `Contour_Mapping` below.	`['Tumour']`
Contour_Mapping	A list of strings to specify how to map each existing contour to a new name.	list of mappings (see below), or `false` to keep the contours as they are.	see below.

For the Contour_Mapping parameter, we use the following nomenclature:

['key1:value1', 'key2:value2', 'key3:value3']

The keys should be contour names that are found in the local contour files, or OMERO contours. The corresponding value should be the name of the label that you want to assign to this specific contour. By convention, use the remaining key to assign remaining contours to a specific value, if you do not want to create a mapping for each possible label. By default, all values will be transformed to lower case and spaces will be removed.

For instance, consider the case where you have contours named "Tumour", "Tumor" and about 20 other contours, and you want to train an algorithm to differentiate tumours from all remaining datasets. The Contour_Mapping parameter would then be defined as:

['Tumour:tumour', 'Tumor:tumour', 'remaining:not_tumour']

Also, re-consider the above case, where you also want to differentiate fat tissue, and the annotator has created labels 'Fat [in]', 'Fat [out]'. You could write the Contour_Mapping parameter as:

['Tumour:tumour', 'Tumor:tumour', 'Fat [in]:fat', 'Fat [out]:fat', 'remaining:not_tumour']

Note that you could also write the above using the Fat* key, which would assign all contours begining with 'Fat' to the key 'fat':

['Tumour:tumour', 'Tumor:tumour', 'Fat*:fat', 'remaining:not_tumour']

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduction to the config file

Detailed parameters

MODEL parameters

AUGMENTATION parameters

CHECKPOINT parameters

CRITERIA parameters

DATA parameters

OPTIMIZER parameters

NORMALIZATION parameters

REGULARIZATION parameters

SCHEDULER parameters

VERBOSE parameters

OMERO parameters

CONTOURS parameters (optional)

FilesExpand file tree

ConfigFile.md

Latest commit

History

ConfigFile.md

File metadata and controls

Introduction to the config file

Detailed parameters

MODEL parameters

AUGMENTATION parameters

CHECKPOINT parameters

CRITERIA parameters

DATA parameters

OPTIMIZER parameters

NORMALIZATION parameters

REGULARIZATION parameters

SCHEDULER parameters

VERBOSE parameters

OMERO parameters

CONTOURS parameters (optional)