Workflow order:
- Preprocessing → Prepare and process mIF tiles
- AdvDINO → Train domain-adversarial self-supervised embeddings
- Analysis
- ABMIL_OS → Perform survival prediction with attention-based MIL
- Clustering → Perform unsupervised clustering and analyze cluster phenotypes
This folder contains scripts for preparing imaages for downstream analysis.
The preprocessing workflow includes three main steps:
-
Downsampling WSIs
- Script:
downsample_images.py
- Script:
-
Generating Tissue Masks
- Script:
save_masks.py
- Script:
-
Processing Tiles
- Script:
process_tiles.py
- Script:
python downsample_images.py --ipt_dir /path/to/images --opt_dir /path/to/downsampled --start 0 --stop 500python save_masks.py --ipt_dir /path/to/downsampled --opt_dir /path/to/maskspython process_tiles.py --ipt_dir /path/to/images --mask_dir /path/to/masks --opt_dir /path/to/tiles --start 0 --stop 500This folder contains scripts for domain-adversarial self-supervised training and tile-level embedding extraction for downstream analysis. The AdvDINO workflow includes two main steps:
-
Training
- Script:
train.py
- Script:
-
Inference (Generate Embeddings)
- Script:
inference.py
- Script:
python -m torch.distributed.launch -nproc_per_node=num_gpus dinov2/train/train.py --config-file=/path/to/configs --output-dir=/path/to/outputs train.dataset_path=mIFDataset:root=/path/to/tilespython dinov2/eval/tile_inference.py --config-file=/path/to/configs --pretrained-weights=/path/to/weights --dataset_str=mIFCoordDataset --input_dir=/path/to/images --output_dir=/path/to/embeddings --start=0 --stop=500 --batch_size=128 --num-workers 8 --device_num=1This folder contains scripts for attention-based multiple instance learning (ABMIL) to perform survival prediction. The workflow includes three main steps:
-
Generate Split Folds
- Script:
final_gen_splits_primary.py - Generates train/val/test splits and folds for cross-validation for survival analysis.
- Script:
-
Compute Quartiles
- Script:
get_quartiles.py - Computes survival time quartiles necessary for updating the configuration file for ABMIL.
- Script:
-
Run ABMIL-OS Inference
- Script:
main.py - Performs survival prediction using tile embeddings (.h5 files) from AdvDINO inference.
- Script:
python final_gen_splits_primary.py --labels_path /path/to/labels --embeddings_dir /path/to/embeddings --num_folds 5 --output_dir /path/to/folds --seed 42python get_quartiles.py --labels_path /path/to/labels --ipt_dir /path/to/embeddings python main.py --config configs/default_config.yaml --embed_dim 1024 --embeddings_dir /path/to/embeddings --learning_rate 1e-4 --epochs 40 --data_source mif --cross_validationThis folder contains scripts for unsupervised clustering of tile embeddings to identify proteomic phenotypes. The clustering workflow includes three main steps:
-
Combine Tile Embeddings
- Script:
combine_tile_embeddings.py - Combines all tile-level .h5 embeddings into a single matrix for clustering.
subsample_combine_tile_embeddings.pycan be used to combine a subset of embeddings for faster exploratory clustering.
- Script:
-
Perform Subsampled Clustering
- Script:
mIF_clustering.py - Performs Leiden clustering on subsampled tile embeddings to identify phenotypic clusters.
- Script:
-
Cluster Attribution (Optional)
- Script:
clustering_attribution.py - Assigns cluster labels to the remaining tiles based on the nearest cluster from the subsampled clustering.
- Produces full-slide cluster assignments and can generate summary outputs for downstream analysis.
- Script:
python combine_tile_embeddings.py --labels_path /path/to/labels --embeddings_path /path/to/embeddings --output_path /path/to/output(Optional: Subsample for faster clustering)
python subsample_combine_tile_embeddings.py --labels_path /path/to/labels --embeddings_path /path/to/embeddings --output_path /path/to/output --multiplier 0.2python mIF_clustering.py --input_path /path/to/combined/embeddings --sample_tile_count 2000000 --n_pcs 128 --resolution 2.0 --n_neighbors 250python clustering_attribution.py --main_cluster_path /path/to/combined/embeddings --target_name remainder_sub0.2_embeddings --ref_name subsampled_clustering --combined_name combined_name --cluster_column leiden_2.0