Binary classification of lung nodules (Benign / Malignant) from CT scans. Built on the LUNA25 challenge framework — Group 12.
lung_nodule_pipeline/
├── lung_nodule/ # Main Python package
│ ├── config.py # Hyperparameters & settings
│ ├── classification/ # 2D / 3D malignancy classifier
│ ├── data/ # Dataset, patch extraction, augmentation
│ ├── detection/ # MONAI RetinaNet nodule detector
│ ├── models/ # Model architectures (ResNet152, UNet3D, ViT, ...)
│ ├── pipeline/ # End-to-end orchestration + DICOM→NIfTI
│ ├── reporting/ # Batch report generation
│ └── training/ # Trainer, loss functions, k-fold splits
│
├── docs/
│ ├── TRAINING.md # Data format, training guide, parameters
│ └── INFERENCE.md # Inference modes, checkpoint setup, MTN guide
│
├── data/ # Training data (gitignored — download separately)
│ ├── image/ # Nodule patches: <AnnotationID>.npy
│ ├── metadata/ # Spatial metadata: <AnnotationID>.npy
│ └── csv/ # 5-fold split CSVs
│
├── weights/ # Pre-trained checkpoints (gitignored — download separately)
│ ├── dt_model.ts # RetinaNet detection model (TorchScript)
│ ├── ResNet152-confirmed/ # 2D classification ensemble (fold_1..5)
│ └── unet3D_encoder_scse/ # 3D classification ensemble (fold0..4)
│
├── train.py # Train 5-fold cross-validation
├── infer.py # Classify known nodule coordinates (single / batch CSV)
├── predict.py # End-to-end DICOM → detect → classify
├── run_report.py # Batch report across a dataset directory
├── infer_mtn.sh # One-shot MTN dataset inference → CSV
│
├── setup.py
├── requirements.txt
└── README.md
Pre-trained weights are hosted on Google Drive (~1.4 GB). Run the download script:
bash download_weights.shThis installs gdown and downloads the weights folder directly from:
https://drive.google.com/drive/folders/1LyVA8gn6EF71iCeVbYkefPp5J1MYxpIR
weights/
├── dt_model.ts # RetinaNet detection model (80 MB)
├── ResNet152-confirmed/ # 2D classification ensemble (5 × 233 MB)
│ └── fold_1..5/best_metric_model.pth
└── unet3D_encoder_scse/ # 3D classification ensemble (5 × 55 MB)
└── best_metric_model_fold0..4.pth
# Create and activate environment
conda create -n lung_nodule python=3.11 -y
conda activate lung_nodule
# Install PyTorch (adjust cu121 to match your CUDA version)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
# Install all remaining dependencies
pip install -r requirements.txtVerify:
python -c "import torch, monai, SimpleITK, timm; print('OK')"See docs/TRAINING.md for the full guide, including:
- Data directory layout and CSV format
- Generating 5-fold splits
- All
train.pyarguments and available model architectures - Expected output structure and metrics
Quick start:
python train.py \
--image_dir ./data \
--csv_dir ./data/csv \
--output_dir ./checkpoints \
--model ResNet152 \
--epochs 200See docs/INFERENCE.md for the full guide, including:
- Checkpoint setup (pre-trained vs. custom)
- Mode A — single nodule from known coordinates
- Mode B — batch CSV inference
- Mode C — MTN dataset: extract ZIPs → detect → classify → CSV (
infer_mtn.sh) - Model types: 2D ResNet152 / 3D UNet3D+scSE / both
- Coordinate system reference and troubleshooting
Quick start (MTN dataset):
bash infer_mtn.sh /path/to/MTN/ --output_dir ./output/mtnSingle nodule:
python infer.py \
--ct patient.nii.gz \
--coord_x -34.3 \
--coord_y 44.2 \
--coord_z -49.3