Skip to content

lotterlab/advdino

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AdvDINO Pipeline

Workflow order:

  1. Preprocessing → Prepare and process mIF tiles
  2. AdvDINO → Train domain-adversarial self-supervised embeddings
  3. Analysis
    • ABMIL_OS → Perform survival prediction with attention-based MIL
    • Clustering → Perform unsupervised clustering and analyze cluster phenotypes

Preprocessing

This folder contains scripts for preparing imaages for downstream analysis.
The preprocessing workflow includes three main steps:

  1. Downsampling WSIs

    • Script: downsample_images.py
  2. Generating Tissue Masks

    • Script: save_masks.py
  3. Processing Tiles

    • Script: process_tiles.py

Usage

1. Downsample WSIs

python downsample_images.py --ipt_dir /path/to/images --opt_dir /path/to/downsampled --start 0 --stop 500

2. Generate Masks

python save_masks.py --ipt_dir /path/to/downsampled --opt_dir /path/to/masks

3. Process Tiles

python process_tiles.py --ipt_dir /path/to/images --mask_dir /path/to/masks --opt_dir /path/to/tiles --start 0 --stop 500

AdvDINO

This folder contains scripts for domain-adversarial self-supervised training and tile-level embedding extraction for downstream analysis. The AdvDINO workflow includes two main steps:

  1. Training

    • Script: train.py
  2. Inference (Generate Embeddings)

    • Script: inference.py

Usage

1. Training

python -m torch.distributed.launch -nproc_per_node=num_gpus dinov2/train/train.py --config-file=/path/to/configs --output-dir=/path/to/outputs train.dataset_path=mIFDataset:root=/path/to/tiles

2. Inference (Generate Embeddings)

python dinov2/eval/tile_inference.py --config-file=/path/to/configs --pretrained-weights=/path/to/weights --dataset_str=mIFCoordDataset --input_dir=/path/to/images --output_dir=/path/to/embeddings --start=0 --stop=500 --batch_size=128 --num-workers 8 --device_num=1

ABMIL_OS

This folder contains scripts for attention-based multiple instance learning (ABMIL) to perform survival prediction. The workflow includes three main steps:

  1. Generate Split Folds

    • Script: final_gen_splits_primary.py
    • Generates train/val/test splits and folds for cross-validation for survival analysis.
  2. Compute Quartiles

    • Script: get_quartiles.py
    • Computes survival time quartiles necessary for updating the configuration file for ABMIL.
  3. Run ABMIL-OS Inference

    • Script: main.py
    • Performs survival prediction using tile embeddings (.h5 files) from AdvDINO inference.

Usage

1. Generate Split Folds

python final_gen_splits_primary.py --labels_path /path/to/labels --embeddings_dir /path/to/embeddings --num_folds 5 --output_dir /path/to/folds --seed 42

2. Compute Quartiles for Config

python get_quartiles.py --labels_path /path/to/labels --ipt_dir /path/to/embeddings 

3. Run ABMIL-OS for Survival Prediction

python main.py --config configs/default_config.yaml --embed_dim 1024 --embeddings_dir /path/to/embeddings --learning_rate 1e-4 --epochs 40 --data_source mif --cross_validation

Clustering

This folder contains scripts for unsupervised clustering of tile embeddings to identify proteomic phenotypes. The clustering workflow includes three main steps:

  1. Combine Tile Embeddings

    • Script: combine_tile_embeddings.py
    • Combines all tile-level .h5 embeddings into a single matrix for clustering.
    • subsample_combine_tile_embeddings.py can be used to combine a subset of embeddings for faster exploratory clustering.
  2. Perform Subsampled Clustering

    • Script: mIF_clustering.py
    • Performs Leiden clustering on subsampled tile embeddings to identify phenotypic clusters.
  3. Cluster Attribution (Optional)

    • Script: clustering_attribution.py
    • Assigns cluster labels to the remaining tiles based on the nearest cluster from the subsampled clustering.
    • Produces full-slide cluster assignments and can generate summary outputs for downstream analysis.

Usage

1. Combine Tile Embeddings

python combine_tile_embeddings.py --labels_path /path/to/labels --embeddings_path /path/to/embeddings --output_path /path/to/output

(Optional: Subsample for faster clustering)

python subsample_combine_tile_embeddings.py --labels_path /path/to/labels --embeddings_path /path/to/embeddings --output_path /path/to/output --multiplier 0.2

2. Perform Clustering

python mIF_clustering.py --input_path /path/to/combined/embeddings --sample_tile_count 2000000  --n_pcs 128 --resolution 2.0 --n_neighbors 250

3. Attribute Remaining Tiles to Clusters (Only for Subsampled)

python clustering_attribution.py --main_cluster_path /path/to/combined/embeddings --target_name remainder_sub0.2_embeddings --ref_name subsampled_clustering --combined_name combined_name --cluster_column leiden_2.0

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages