Python code to reproduce all figures and analyses used for the evaluation of POME.
conda env create -f environment.yml
conda activate hancock_survivalpython -m venv venv
source venv/bin/activate
pip install -e .Simply run
python scripts/generate_imputation_figure.pyThe figure can be reproduced with the notebook scripts/plot_fig4_unsupervised.ipynb. The scripts to produce the required input for this notebook under src/pome_evaluation are analyze_unsupervised_clustering.py, analyze_cluster_preservation_2D.ipynb, and analyze_distance_preservation_2D.ipynb. Necessary UMAP embeddings were computed using the script embed_UMAP_several_runs.py. POME's embeddings 2D visualizations were computed with the help of the notebook project_embeddings_to_2D.ipynb.
The results figure showing POME's embeddings' supervised learning capability can be reproduced with the notebook scripts/plot_fig3_linear_probing.ipynb. As input, it takes the output files of the analysis scripts analyze_HANCOCK_embedding_separability.ipynb, analyze_LUAD_embedding_separability.ipynb, and analyze_MIMIC_embedding_separability.ipynb.
Simply run
python scripts/generate_survival_figure.pyThe results figure showing POME's variable embedding results can be reproduced with the notebook scripts/plot_fig5_variable_embeddings.ipynb. As input, it takes output files of the analysis script analyze_variable_embeddings.ipynb. The required files storing feature importances for Aplasia and Neutropenic Fever are located under data/feature_ranks_NF.csv and data/feature_ranks_aplasia.csv.
Simply run
python scripts/generate_imputation_binning_figure.py
python scripts/generate_imputation_dim_figure.pySimply use the notebook located at scripts/plot_supplement_unsupervised_per_dimension.ipynb.
Simply use the notebook located at scripts/plot_supplement_resource_benchmark.ipynb.
Simply use the notebook located at scripts/plot_supplement_imputation_epochs.ipynb.
All of the following scripts for re-running the performed analyses on POME are located in the directory src/pome_evaluation.
For computing required low-dimensional UMAP embeddings on input datasets, run the script embed_UMAP_several_runs.py with the desired dataset to embed specified on the top part of the file.
For computing 2D visualizations (using PCA, t-SNE, UMAP) of POME's 16-, 32-, and 64-dimensional embeddings, you can run the notebook project_embeddings_to_2D.ipynb.
For computing clusterability metrics on POME and UMAP embeddings, run the script analyze_unsupervised_clustering.py with the desired dataset specified in the beginning of the file.
For analyzing which visualization technique best preserves high-dimension cluster, run the notebook analyze_cluster_preservation_2D.ipynb. For analyzing which visualization technique best preservese local neighborhood structures, simply run the notebook analyze_distance_preservation_2D.ipynb.
For comparing how well POME's and UMAP's embeddings are suitable for predicting held-out target variables by using a simple logistic regression model, we provide one notebook for each dataset separately: analyze_HANCOCK_embedding_separability.ipynb, analyze_LUAD_embedding_separability.ipynb, and analyze_MIMIC_embedding_separability.ipynb.
In order to simulate certain amounts of missingness into the given datasets, simply run the notebook generate_simulated_missingness.ipynb with updated paths pointing to the files of the desired dataset. Scripts and notebooks to impute simulated datasets with the different imputation methods can be found in the respective subdirectories under data/imputation_data/.
In order to compute mean absolute errors and multiclass accuracies of imputed values against ground truth values, you can make use of the Python notebook compute_imputation_results.ipynb.