POME Evaluation

Python code to reproduce all figures and analyses used for the evaluation of POME.

Installation

Using conda

conda env create -f environment.yml
conda activate hancock_survival

Using pip

python -m venv venv
source venv/bin/activate
pip install -e .

Generating manuscript figures

Imputation analysis

Simply run

python scripts/generate_imputation_figure.py

Unsupervised stratification results

The figure can be reproduced with the notebook scripts/plot_fig4_unsupervised.ipynb. The scripts to produce the required input for this notebook under src/pome_evaluation are analyze_unsupervised_clustering.py, analyze_cluster_preservation_2D.ipynb, and analyze_distance_preservation_2D.ipynb. Necessary UMAP embeddings were computed using the script embed_UMAP_several_runs.py. POME's embeddings 2D visualizations were computed with the help of the notebook project_embeddings_to_2D.ipynb.

Linear probing analysis

The results figure showing POME's embeddings' supervised learning capability can be reproduced with the notebook scripts/plot_fig3_linear_probing.ipynb. As input, it takes the output files of the analysis scripts analyze_HANCOCK_embedding_separability.ipynb, analyze_LUAD_embedding_separability.ipynb, and analyze_MIMIC_embedding_separability.ipynb.

Survival analysis on HANCOCK

Simply run

python scripts/generate_survival_figure.py

Exploratory analysis of variable embeddings

The results figure showing POME's variable embedding results can be reproduced with the notebook scripts/plot_fig5_variable_embeddings.ipynb. As input, it takes output files of the analysis script analyze_variable_embeddings.ipynb. The required files storing feature importances for Aplasia and Neutropenic Fever are located under data/feature_ranks_NF.csv and data/feature_ranks_aplasia.csv.

Supplement imputation plots

Simply run

python scripts/generate_imputation_binning_figure.py
python scripts/generate_imputation_dim_figure.py

Supplement unsuperivsed results per embedding sizes

Simply use the notebook located at scripts/plot_supplement_unsupervised_per_dimension.ipynb.

Supplement resource benchmark

Simply use the notebook located at scripts/plot_supplement_resource_benchmark.ipynb.

Supplement imputation across epochs

Simply use the notebook located at scripts/plot_supplement_imputation_epochs.ipynb.

Analysis scripts for POME Evaluation

All of the following scripts for re-running the performed analyses on POME are located in the directory src/pome_evaluation.

Compute low-dimensional UMAP embeddings

For computing required low-dimensional UMAP embeddings on input datasets, run the script embed_UMAP_several_runs.py with the desired dataset to embed specified on the top part of the file.

Project POME embeddings to 2D

For computing 2D visualizations (using PCA, t-SNE, UMAP) of POME's 16-, 32-, and 64-dimensional embeddings, you can run the notebook project_embeddings_to_2D.ipynb.

Analyze unsupervised clusterability

For computing clusterability metrics on POME and UMAP embeddings, run the script analyze_unsupervised_clustering.py with the desired dataset specified in the beginning of the file.

Analyze visualization techniques

For analyzing which visualization technique best preserves high-dimension cluster, run the notebook analyze_cluster_preservation_2D.ipynb. For analyzing which visualization technique best preservese local neighborhood structures, simply run the notebook analyze_distance_preservation_2D.ipynb.

Analyze supervised learning results

For comparing how well POME's and UMAP's embeddings are suitable for predicting held-out target variables by using a simple logistic regression model, we provide one notebook for each dataset separately: analyze_HANCOCK_embedding_separability.ipynb, analyze_LUAD_embedding_separability.ipynb, and analyze_MIMIC_embedding_separability.ipynb.

Generate simulated missingness datasets

In order to simulate certain amounts of missingness into the given datasets, simply run the notebook generate_simulated_missingness.ipynb with updated paths pointing to the files of the desired dataset. Scripts and notebooks to impute simulated datasets with the different imputation methods can be found in the respective subdirectories under data/imputation_data/.

Analyze imputation results

In order to compute mean absolute errors and multiclass accuracies of imputed values against ground truth values, you can make use of the Python notebook compute_imputation_results.ipynb.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github		.github
data		data
output		output
scripts		scripts
src/pome_evaluation		src/pome_evaluation
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

POME Evaluation

Installation

Using conda

Using pip

Generating manuscript figures

Imputation analysis

Unsupervised stratification results

Linear probing analysis

Survival analysis on HANCOCK

Exploratory analysis of variable embeddings

Supplement imputation plots

Supplement unsuperivsed results per embedding sizes

Supplement resource benchmark

Supplement imputation across epochs

Analysis scripts for POME Evaluation

Compute low-dimensional UMAP embeddings

Project POME embeddings to 2D

Analyze unsupervised clusterability

Analyze visualization techniques

Analyze supervised learning results

Generate simulated missingness datasets

Analyze imputation results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

POME Evaluation

Installation

Using conda

Using pip

Generating manuscript figures

Imputation analysis

Unsupervised stratification results

Linear probing analysis

Survival analysis on HANCOCK

Exploratory analysis of variable embeddings

Supplement imputation plots

Supplement unsuperivsed results per embedding sizes

Supplement resource benchmark

Supplement imputation across epochs

Analysis scripts for POME Evaluation

Compute low-dimensional UMAP embeddings

Project POME embeddings to 2D

Analyze unsupervised clusterability

Analyze visualization techniques

Analyze supervised learning results

Generate simulated missingness datasets

Analyze imputation results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages