Skip to content

bionetslab/POME_Evaluation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

POME Evaluation

Python code to reproduce all figures and analyses used for the evaluation of POME.

Installation

Using conda

conda env create -f environment.yml
conda activate hancock_survival

Using pip

python -m venv venv
source venv/bin/activate
pip install -e .

Generating manuscript figures

Imputation analysis

Simply run

python scripts/generate_imputation_figure.py

Unsupervised stratification results

The figure can be reproduced with the notebook scripts/plot_fig4_unsupervised.ipynb. The scripts to produce the required input for this notebook under src/pome_evaluation are analyze_unsupervised_clustering.py, analyze_cluster_preservation_2D.ipynb, and analyze_distance_preservation_2D.ipynb. Necessary UMAP embeddings were computed using the script embed_UMAP_several_runs.py. POME's embeddings 2D visualizations were computed with the help of the notebook project_embeddings_to_2D.ipynb.

Linear probing analysis

The results figure showing POME's embeddings' supervised learning capability can be reproduced with the notebook scripts/plot_fig3_linear_probing.ipynb. As input, it takes the output files of the analysis scripts analyze_HANCOCK_embedding_separability.ipynb, analyze_LUAD_embedding_separability.ipynb, and analyze_MIMIC_embedding_separability.ipynb.

Survival analysis on HANCOCK

Simply run

python scripts/generate_survival_figure.py

Exploratory analysis of variable embeddings

The results figure showing POME's variable embedding results can be reproduced with the notebook scripts/plot_fig5_variable_embeddings.ipynb. As input, it takes output files of the analysis script analyze_variable_embeddings.ipynb. The required files storing feature importances for Aplasia and Neutropenic Fever are located under data/feature_ranks_NF.csv and data/feature_ranks_aplasia.csv.

Supplement imputation plots

Simply run

python scripts/generate_imputation_binning_figure.py
python scripts/generate_imputation_dim_figure.py

Supplement unsuperivsed results per embedding sizes

Simply use the notebook located at scripts/plot_supplement_unsupervised_per_dimension.ipynb.

Supplement resource benchmark

Simply use the notebook located at scripts/plot_supplement_resource_benchmark.ipynb.

Supplement imputation across epochs

Simply use the notebook located at scripts/plot_supplement_imputation_epochs.ipynb.

Analysis scripts for POME Evaluation

All of the following scripts for re-running the performed analyses on POME are located in the directory src/pome_evaluation.

Compute low-dimensional UMAP embeddings

For computing required low-dimensional UMAP embeddings on input datasets, run the script embed_UMAP_several_runs.py with the desired dataset to embed specified on the top part of the file.

Project POME embeddings to 2D

For computing 2D visualizations (using PCA, t-SNE, UMAP) of POME's 16-, 32-, and 64-dimensional embeddings, you can run the notebook project_embeddings_to_2D.ipynb.

Analyze unsupervised clusterability

For computing clusterability metrics on POME and UMAP embeddings, run the script analyze_unsupervised_clustering.py with the desired dataset specified in the beginning of the file.

Analyze visualization techniques

For analyzing which visualization technique best preserves high-dimension cluster, run the notebook analyze_cluster_preservation_2D.ipynb. For analyzing which visualization technique best preservese local neighborhood structures, simply run the notebook analyze_distance_preservation_2D.ipynb.

Analyze supervised learning results

For comparing how well POME's and UMAP's embeddings are suitable for predicting held-out target variables by using a simple logistic regression model, we provide one notebook for each dataset separately: analyze_HANCOCK_embedding_separability.ipynb, analyze_LUAD_embedding_separability.ipynb, and analyze_MIMIC_embedding_separability.ipynb.

Generate simulated missingness datasets

In order to simulate certain amounts of missingness into the given datasets, simply run the notebook generate_simulated_missingness.ipynb with updated paths pointing to the files of the desired dataset. Scripts and notebooks to impute simulated datasets with the different imputation methods can be found in the respective subdirectories under data/imputation_data/.

Analyze imputation results

In order to compute mean absolute errors and multiclass accuracies of imputed values against ground truth values, you can make use of the Python notebook compute_imputation_results.ipynb.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages