This repository was created as part of our research on batch normalization layers and its effects on the mammography classification under domain shift.
Contents of the Repository:
- Data Module: Dataset class for handling mammography images, mammography-specific transforms, and data samplers for various training strategies, including domain adversarial training. Source code for the data module.
- Models Module: Contains deep learning models used in this research, along with trainer and evaluator modules for model training and evaluation in various settings, including domain adversarial training. Also includes modules for schedulers and loss functions. Source code for the models module.
- Utils Module: Utility methods, such as freezing layers and plotting mammography images. Source code for the utils module.
- Visualization Module: Tensorboard modules for monitoring trainings. Source code for the visualization module.
- Notebooks: Model training notebooks for various strategies, including domain adversarial training and training only the BN and FC layers. Source code for the training notebooks.
This repository serves as a valuable resource for breast cancer recognition using mammography images. Contributions, questions, and feedback are welcome.
├── LICENSE
├── README.md <- The top-level README for developers using this project.
├── .env <- Environment variables.
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── HospitalX <- Dataset folder.
│ └── training.xlsx <- Metadata files contain at least the following columns: BreastID, FilePath,
│ OneHotLabel, and ImageLaterality. For domain-adversarial training (DAT), add a DomainLabel column whose values are either Source or Target.
│ └── validation.xlsx
│ └── test.xlsx
│ └── raw <- The original, immutable data dump.
│
├── models <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks <- Jupyter notebooks.
│
├── references <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated graphics and figures to be used in reporting
│
├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
│ generated with `pip freeze > requirements.txt`
│
├── src <- Source code for use in this project.
│ ├── __init__.py <- Makes src a Python module
│ │
│ ├── data <- Scripts to access mammography datasets, sample data, and transform images
│ │ └── DataSamplers.py
│ │ └── Dataset.py
│ │ └── Transforms.py
│ │
│ ├── features <- Scripts to turn raw data into features for modeling
│ │
│ ├── models <- Scripts to implement, train and evaluate models
│ │ └── models <- Scripts to implement models
│ │ └── EvaluationTools.py
│ │ └── Evaluator.py
│ │ └── LossFunctions.py
│ │ └── Schedulers.py
│ │ └── Trainer.py
│ │
│ ├── preprocess <- Scripts to turn raw data into features for modeling
│ │
│ ├── utils <- Scripts to load configuration parameters from yaml files and to use common
| rutines of the project
│ │ └── ActivationGradientHooks.py <- Hooks for capturing activations and gradients.
│ │ └── Config.py <- Configuration class to access parameters in `config.yaml` with dot notation.
│ │ └── Utils.py
│ │
│ └── visualization <- Scripts for logging training results using TensorBoard.
│ └── Tools.py <- Visualization methods.
│ └── Tensorboard.py <- Logging training events for Tensorboard.
For an efficient and organized development process, it is recommended to use a virtual environment. To run the code seamlessly, add the src folder to your interpreter. For users of virtualenvwrapper, run the following command in the project directory while the virtual environment is active: add2virtualenv src.
An in-house FFDM dataset, HCTP, along with VinDr-Mammo and CSAW-CC (mammography), were used. The clinical data used in this study are not publicly available due to institutional data ownership and confidentiality policies. Access to the data may be considered on reasonable request and with permission from the corresponding institutional authorities.
Table 3: Models follow the notation 𝓜sourcestatistics, where source indicates the training dataset and the superscript statistics specifies the BN statistics used: tr for training-time moving averages, tt for test-time recomputed statistics. Columns correspond to the evaluation sets. The model denoted with an apostrophe (') indicates evaluation conducted on input data that has been normalized to the [0, 1] range.
Figure A.13: KDEs of per-channel activations for BN layers in the second block of ResNet layers 2, 3, and 4. All KDEs in this section are computed using a mini-batch of 16 images sampled from the HCTP dataset.
If you find this work useful, please cite our paper:
Akyüz, U., Katircioglu-Öztürk, D., Süslü, E.K., Keleş, B., Kaya, M.C., Durhan, G., Akpınar, M.G., Demirkazık, F.B. and Akar, G.B., 2025. DoSReMC: Domain Shift Resilient Mammography Classification using Batch Normalization Adaptation. arXiv preprint arXiv:2508.15452.
@article{akyuz2025dosremc,
title={DoSReMC: Domain Shift Resilient Mammography Classification using Batch Normalization Adaptation},
author={Aky{\"u}z, U{\u{g}}urcan and Katircioglu-{\"O}zt{\"u}rk, Deniz and S{\"u}sl{\"u}, Emre K and Kele{\c{s}}, Burhan and Kaya, Mete C and Durhan, Gamze and Akp{\i}nar, Meltem G and Demirkaz{\i}k, Figen B and Akar, G{\"o}zde B},
journal={arXiv preprint arXiv:2508.15452},
year={2025},
doi={10.48550/arXiv.2508.15452}}
Project based on the cookiecutter data science project template. #cookiecutterdatascience

