Breast Cancer Molecular Subtype Classification

Overview of the project

This project aims to train a deep learning model to classify breast cancer molecular subtypes using The Chinese Mammography Database (CMMD).

REF: CMMD

The approach is to try several backbones from the Vision Transformers family, including base Vision Transformer, Swin Transformer, and Multi-Axis Vision Transformer and also ResNet and ResNet101 as baseline.

Project structure

├── 📁 data
│   ├── raw - Raw data, unprocessed images
│   ├── processed - Processed data, ready for training or splitting.
│   └── splits - Data splits for training, testing or validation.
├── 📁 notebooks - Jupyter notebooks for data exploration and visualization.
├── 📁 models 
│   ├── checkpoints - Checkpoints for the trained models.
│   ├── artifacts - Compiled artifacts for the trained models.
├── 📁 src
│   ├── modeling -  Datasets, transforms, and model backbones.
│   ├── training - Utility functions for training.
│   ├── testing - Utility functions for testing.
│   ├── utils - Utility functions for data processing and visualization.
│   └── config.py - Configuration file for the project.
├── 🐍 train.py -  Training entry point.
├── 🐍 test.py -  Testing entry point.
├── 🐍 predict.py -  Prediction entry point.
├── ⚙️ .env - Environment variables for the project.

Requirements

Poetry is used for dependency management.

To install Poetry, visit Poetry installation and follow the instructions for your operating system.

Main dependencies

To install the dependencies, run the following command in the root directory of the project:

poetry install

After this, poetry will create a virtual environment for the project and install all the dependencies in it.

Training

Poetry allows you to run scripts with the virtual environment activated. To run the training script, use the following command:

poetry run train

or

python -m train

in case you want to run the script without poetry.

Train script arguments

ALLOWED_BACKBONES = ["resnet101", "maxvit", "vit", "swin"]

Argument	Type	Default Value	Description
`--backbone`	string	"resnet101"	Model architecture to use (options available in ALLOWED_BACKBONES)
`--train_dir`	string	SPLIT_TRAIN_DIR	Path to training data directory
`--epochs`	int	10	Number of epochs to train
`--batch_size`	int	64	Batch size for training
`--augment`	flag	False	Enable to use data augmentation
`--oversample`	flag	False	Enable to use oversampling for class imbalance
`--training-mode`	string	'k-folds'	Training mode ('k-folds' or 'full')
`--k_folds`	int	3	Number of folds for K-Fold cross-validation
`--extension`	string	'png'	Image file extension
`--seed`	int	42	Seed for random number generation
`--num_classes`	int	4	Number of classes for classification
`--enable_logging`	flag	False	Enable to activate logging with WandB
`--lr`	float	0.001	Learning rate

Training can be done using K-Fold cross-validation or full training. The --training-mode argument can be set to either k-folds or full. If k-folds is selected, the --k_folds argument can be used to specify the number of folds. The training script will create a directory for each fold in the models/checkpoints directory.

Examples

# Train a model with default parameters
poetry run train
# Train a model with a different backbone
poetry run train --backbone maxvit --epochs 40 --batch_size 32 --augment --enable_logging

Testing

The testing script is used to evaluate the performance of the trained model on a test dataset. It can be run with the following command, passing as argument the path to the model checkpoint:

poetry run test --checkpoint_dir models/checkpoints/maxvit-codenam-53/fold_0.ckpt --backbone maxvit

Test script arguments

Argument	Type	Default Value	Description
`--test_dir`	string	SPLIT_TEST_DIR	Path to test data directory
`--checkpoint_dir`	string	None (required)	Path to the model checkpoint directory
`--batch_size`	int	32	Batch size for evaluation
`--backbone`	string	"resnet101"	Model architecture to use (options available in ALLOWED_BACKBONES)
`--extension`	string	'png'	Image file extension
`--report`	flag	False	Enable to generate a classification report

Examples

# Test a model with default parameters
poetry run test --checkpoint_dir models/checkpoints/maxvit-codenam-53/fold_0.ckpt
# Test a model with a different backbone and classification report
poetry run test --checkpoint_dir models/checkpoints/maxvit-codenam-53/fold_0.ckpt --backbone maxvit --batch_size 16 --report

Prediction

Right now, prediction will load an onnx model and run inference on a single image. The image should be passed as an argument to the script.

The model path can be also passed as an argument, but it will default to the best.onnx file in the models/artifacts directory.

This script can be run with the following command, passing as argument the path to the image:

poetry run predict --test_image data/raw/benign/test_image.png

Predict script arguments

Argument	Type	Default Value	Description
`--test_image`	string	None	Path to the test image
`--model_path`	string	None	Path to the ONNX model

Other notes

The project is using WandB for logging and tracking experiments. You can enable it by passing the --enable_logging flag to the training script. The WANDB_API_KEY should be set in the .env file.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
data		data
models		models
notebooks		notebooks
reports		reports
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
config.py		config.py
env.example		env.example
poetry.lock		poetry.lock
poetry.toml		poetry.toml
predict.py		predict.py
pyproject.toml		pyproject.toml
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Breast Cancer Molecular Subtype Classification

Table of contents

Overview of the project

Project structure

Requirements

Main dependencies

Training

Train script arguments

Examples

Testing

Test script arguments

Examples

Prediction

Predict script arguments

Other notes

About

Uh oh!

Releases

Packages

Languages

dacxjo/TFG

Folders and files

Latest commit

History

Repository files navigation

Breast Cancer Molecular Subtype Classification

Table of contents

Overview of the project

Project structure

Requirements

Main dependencies

Training

Train script arguments

Examples

Testing

Test script arguments

Examples

Prediction

Predict script arguments

Other notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages