MuCTaL

Brian Isett, Rebekah Dadey, Aofei Li, Ryan C. Augustin, Kate Smith, Aatur D. Singhi, Qiangqiang Gu, Riyue Bao. A lightweight multi-cancer tumor localization framework for whole-slide histopathology images (H&E). DOI:10.48550/arXiv.2603.08844.

MuCTaL provides scripts and notebooks for end-to-end pathology AI workflows:

whole-slide preprocessing and tiling,
tile-level model training and inference (FastAI/PyTorch),
post-processing into clinically useful outputs (heatmaps and GeoJSON regions),
cross-validation utilities and exploratory notebooks.

Trained model available on huggingface https://huggingface.co/hillmancancercenterds/MuCTaL

Repository Summary

This repository is organized around a practical WSI (whole slide image) pipeline:

Preprocess slides into tissue-rich tiles using pathml-based pipelines.
Train CNN models on labeled tiles (FastAI).
Run inference on unseen tiles/slides.
Convert predictions to visual artifacts (heatmaps) and annotation formats (GeoJSON).
Evaluate folds/repeats with cross-validation helper scripts.

The codebase appears to be research/HPC-oriented and includes path placeholders and batch-style scripts, so users should expect to adapt data paths and execution wrappers to their environment.

Folder Structure

MuCTaL/
├── LICENSE
├── README.md
├── helpers/
│   ├── __init__.py
│   ├── anno.py
│   ├── preproc.py
│   └── tile.py
├── notebooks/
│   ├── 01_generate_wsi_samplesheet_run_preprocessing.ipynb
│   ├── 02_annotated_tile_file_org_for_training.ipynb
│   ├── 03_train_model_fastai2.7.ipynb
│   ├── 04_model_eval_fastai2.7.ipynb
│   ├── 05_example_inference_to_geojson.ipynb
│   ├── 06_acral_tile_heatmap_class_viz.ipynb
│   └── 07_percent_predicted_tumor_each_slide.ipynb
├── pipeline/
│   ├── fastai_inference_v10.py
│   ├── pathml_preproc_v10.py
│   ├── tile_infer_to_geojson.py
│   └── tile_infer_to_heatmap.py
└── train/
    └── train_full.py

What Each Main Module Does

pipeline/: Main runnable pipeline scripts for preprocessing, inference, and output generation.
train/: End-to-end model training script for full dataset training.
helpers/: Reusable utility code (annotation geometry checks, preprocessing helpers, tile parsing).
notebooks/: Interactive analysis/tutorial notebooks for data prep, training, evaluation, and visualization.

Typical Workflow

Prepare sample metadata (notebooks and TSV inputs).
Preprocess WSIs to tiles with pipeline/pathml_preproc_v10.py.
Build balanced tile CSVs
Train model(s) with scripts in train/
Infer tile probabilities using pipeline/fastai_inference_v10.py.
Generate outputs:
- GeoJSON tumor regions: pipeline/tile_infer_to_geojson.py
- Slide heatmaps/overlays: pipeline/tile_infer_to_heatmap.py

Requirements

Core dependencies inferred from scripts:

Python 3.9+
fastai, torch, torchvision
pathml
opencv-python
numpy, pandas, scipy, matplotlib, tqdm
dask, distributed (optional, for distributed preprocessing)
Pillow
cv2geojson

Many scripts are designed for HPC/SLURM environments and may rely on environment variables (e.g., SLURM_SCRATCH) and local data layouts.

Quick Start

Update all paths below for your environment.

1) Preprocess a slide to tiles

python pipeline/pathml_preproc_v10.py \
  /path/to/output \
  /path/to/slide.svs \
  224 \
  /path/to/MuCTaL

2) Run tile inference

python pipeline/fastai_inference_v10.py \
  /path/to/tiles_df.tsv \
  /path/to/model.pkl \
  /path/to/tile_root/ \
  /path/to/output

3) Convert predictions to GeoJSON

python pipeline/tile_infer_to_geojson.py \
  /path/to/infer_tiles.tsv \
  /path/to/output_geojson

4) Generate heatmap overlay

python pipeline/tile_infer_to_heatmap.py \
  /path/to/infer_tiles.tsv \
  /path/to/original_slide.svs \
  /path/to/output_heatmaps

Notes and Caveats

Several scripts contain hard-coded placeholders (e.g., /path/to/...) and versioned naming conventions.
Some utilities expect specific input TSV schemas (tile paths, slide/case IDs, class labels).
Cross-validation scripts assume specific model naming patterns such as: arch_kfold_nrep_nbal_px_v (example style from code).

License

This project includes a LICENSE file in the repository root. See that file for usage terms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MuCTaL

Repository Summary

Folder Structure

What Each Main Module Does

Typical Workflow

Requirements

Quick Start

1) Preprocess a slide to tiles

2) Run tile inference

3) Convert predictions to GeoJSON

4) Generate heatmap overlay

Notes and Caveats

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
helpers		helpers
notebooks		notebooks
pipeline		pipeline
train		train
Academic Use EULA.docx		Academic Use EULA.docx
README.md		README.md
tile_prediction_heatmap.png		tile_prediction_heatmap.png

Folders and files

Latest commit

History

Repository files navigation

MuCTaL

Repository Summary

Folder Structure

What Each Main Module Does

Typical Workflow

Requirements

Quick Start

1) Preprocess a slide to tiles

2) Run tile inference

3) Convert predictions to GeoJSON

4) Generate heatmap overlay

Notes and Caveats

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages