Laurenz Adrian Heidrich, Aditya Rastogi, Priyank Upadhya, Gianluca Brugnara, Martha Foltyn-Dumitru, Benedikt Wiestler, Philipp Vollmuth
This repository contains the official implementation of the methods described in the paper. For the most seamless experience it contains a copy of the entire mmdetection framework and extends it with curriculum learning functionalities. If you already use mmdetection, you can just pick the config files and the code extensions (exact filenames, see below) and add them to your mmdetection version.
- Extensions to MMDetection:
- Curriculum learning integration (check out mmdetection/mmdet/datasets/dataset_wrappers.py and mmdetection/mmdet/engine/runner/loops.py)
- Support for multiple datasets in curriculum learning
- MMDetection config files used for training
- Minimal example scripts for:
- Training
- Testing
- Jupyter notebook to generate difficulty scores based on minimal object size
Keywords: Medical Image Analysis, Deep Learning, Tumor Detection, Curriculum Learning
Pathology detection in medical imaging is crucial for radiologists, but current approaches that train specialized models for each region of interest often lack efficiency and robustness. This repository provides a novel language-guided object detection pipeline for medical imaging, leveraging curriculum learning strategies to progressively train models on increasingly complex samples. Our unified pipeline converts segmentation datasets into bounding box annotations and applies two curriculum learning approaches—teacher curriculum and bounding box size curriculum—to train a Grounding DINO model. Evaluations on diverse tumor types in MRI and CT scans demonstrate significant improvements in detection accuracy, with curriculum learning yielding up to 5.2% AP increase over baseline models.
- Unified pipeline for converting segmentation datasets to bounding box annotations
- Language-guided detection using Grounding DINO
- Curriculum learning strategies:
- Teacher curriculum
- Bounding box size curriculum
- Supports multi-modal medical imaging (MRI, CT)
- Significant improvements in generalization and detection accuracy across pathologies
This project follows the MMDetection framework's official installation instructions with specific requirements.
conda create –name openmmlab python=3.8 -y
conda activate openmmlab
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -U openmim
pip install fsspec
mim install mmengine
mim install “mmcv==2.1.0”
git clone https://github.com/laurenzheidrich/LangBoxMed.git
cd LangBoxMed/mmdetection/
pip install -v -e .
mim download mmdet –config rtmdet_tiny_8xb32-300e_coco –dest .
python demo/image_demo.py demo/demo.jpg rtmdet_tiny_8xb32-300e_coco.py
–weights rtmdet_tiny_8xb32-300e_coco_20220902_112414-78e30dcc.pth
–device cpu
pip install fairscale
pip install transformers
First download the weights from Google Drive
Then run inference using
!python mmdetection/demo/image_demo.py demo_image.png \
mmdetection/configs/mm_grounding_dino/grounding_dino_swin-t_finetune_ts_multiple_curriculum.py \
--weights model_teacher.pth \
--texts 'glioma . brain metastasis . kidney tumor . liver tumor' -c
! ./mmdetection/tools/dist_train.sh mmdetection/configs/mm_grounding_dino/sample_config_file.py 2
Feel free to check out the config files grounding_dino_swin-t_finetune_ts_multiple.py and grounding_dino_swin-t_finetune_ts_multiple_curriculum.py to understand how to use multiple datasets, perform undersampling, curriculum learning as well as undersampling during concatenation.
