Official PyTorch implementation of "DGSurv: Dynamic Graph-Based Multimodal Learning for Interpretable Cancer Survival Prediction" by Sajjad Shahabi, Zijun Cui, Ruishan Liu, Joseph Carlson, and Yan Liu.
The current version was tested on the following Python and CUDA versions:
- Python 3.10.11
- CUDA 11.3
We recommend using Conda, to setup the Python environment based on the environment.yml file.
Use the following command to setup the evironment through Conda:
conda env create -f environment.yml -n int_env
To replicate the results of the experiments, please download the following datasets:
- Clinical: TCGA - PanCanAtlas
- mRNA: TCGA - PanCanAtlas
- miRNA: TCGA - PanCanAtlas
- WSI: UNI2-h Pretrained Features
The source for the Clinical, mRNA, and miRNA data is Genomic Data Commons (GDC) website, and for the WSI data is UNI model github.
After downloading the datasets, move them to /preprocess/data.
Note that the current repository only works for Kidney cancer (KICH, KIRP, and KIRC). However, the same code can be used for training on other datasets with minimal modification. More details on how the code should be modified for other datasets are provided at /preprocess/README.md.
After moving the datasets to /preprocess/data, you can preprocess the data by executing clinical.ipynb, mirna.ipynb, mrna.ipynb, and wsi.ipynb under the preprocess subfolder. Note that you should execute clinical.ipynb before mirna.ipynb and mrna.ipynb.
To generate the splits used for 5-fold cross validation, run the /splits/create_splits_kidney.ipynb script. Note that this should be done after executing /preprocess/clinical.ipynb script.
After preprocessing the data and generating the splits, you can train and evaluate the models by running their respective IPython notebooks.
- Proposed Method (DGSurv):
dgsurv.ipynb - Baselines: (Maximization)
max.ipynb, (Attention)attention.ipynb, (Graph Attention)graph_att.ipynb
Note that the execution setting of each script can be modified by updating the Args class at the beginning of each file.
The results of the execution will be saved under the /logs folder, and you can use /logs/plot_results.ipynb to summarize the results.
To generate single modality results, execute single_modality.ipynb script and display the results using /logs/plot_results_single.ipynb.
Table below displays the performance of our proposed method and baselines on the Kidney cancer dataset:
| Maximization | Attention | Graph Attention | DGSurv | |
|---|---|---|---|---|
| Kidney | 0.7511 (0.0244) | 0.7711 (0.0208) | 0.7489 (0.0169) | 0.7725 (0.0177) |
And the single modality performance:
| Clinical | mRNA | miRNA | WSI | |
|---|---|---|---|---|
| Kidney | 0.5670 (0.0750) | 0.7502 (0.0313) | 0.7052 (0.0270) | 0.7304 (0.0141) |
We use a modified version of the SHAP package to extract feature attribution.
You can execute /interpretability/extract_shap_values.ipynb to extract Shapley values. Note that prior to this, the model should be trained and saved at /logs folder.
To display the extracted Shapley values, you can use the /interpretability/plot_shap_values.ipynb script.
Citation will be provided after publication

