Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
d3c2643
Adding Bayesian to models, predictors and train.
EmaDulj Dec 10, 2023
56798e7
Adding Bayesian to models, predictors and train.
EmaDulj Dec 10, 2023
6a15c71
Adding Bayesian to active learning
EmaDulj Dec 10, 2023
3778655
Adding Bayesian to default
EmaDulj Dec 10, 2023
ce8f692
changes done during setup
DhanushkiMapitigama Dec 11, 2023
e78226e
Merge pull request #1 from DhanushkiMapitigama/setup-changes
DhanushkiMapitigama Dec 11, 2023
e194f8d
Merge branch 'bayesian_before_and_after_merge' into bayesian_before_a…
EmaDulj Dec 11, 2023
1fa57e2
Merge pull request #2 from DhanushkiMapitigama/bayesian_before_and_af…
EmaDulj Dec 11, 2023
442f24d
Adding Bayesain to one_hidden_drug.py
EmaDulj Dec 12, 2023
d118c63
Merge branch 'bayesian_before_and_after_merge' of github.com:EmaDulj/…
EmaDulj Dec 12, 2023
4bb0e68
Adding Bayesian shuffled predictor
EmaDulj Dec 12, 2023
fb75d6c
Adding Bayesian to default shuffled
EmaDulj Dec 12, 2023
9a0df5e
Adding Bayesian to one_hidden_shuffled.py
EmaDulj Dec 12, 2023
6bad5c9
Added Bayesian no permutation invariance MLP predictor
EmaDulj Dec 12, 2023
76b898c
Added Bayesian to model_evaluation_no_permut_invariance.py
EmaDulj Dec 12, 2023
ec2115d
Added Bayesian to one_hidden_drug_split_no_permut_invariance.py
EmaDulj Dec 12, 2023
d60069c
Added Bayesian to BilinearFilmMLPPredictor
EmaDulj Dec 13, 2023
f26befa
Added Bayesian to model_evaluation_multi_cell_line.py
EmaDulj Dec 13, 2023
1a9a3c0
Added Bayesian to model_evaluation_multi_cell_line_shuffled.py
EmaDulj Dec 13, 2023
21f3696
Added Bayesian to model_evaluation_multi_cell_line_no_permut_invarian…
EmaDulj Dec 13, 2023
144dc5e
Added Bayesian to cell_line_transfer.py
EmaDulj Dec 13, 2023
f4e69ea
Added Bayesian to cell_line_transfer_shuffled.py
EmaDulj Dec 13, 2023
355a60c
Added Bayesian to cell_line_transfer_no_permut_invariance.py
EmaDulj Dec 13, 2023
93d399e
Added Bayesian to more predictors
EmaDulj Dec 13, 2023
f5373c5
Merge pull request #5 from EmaDulj/bayesian_before_and_after_merge
EmaDulj Dec 13, 2023
3b15f12
Added optimal kl weight
EmaDulj Dec 14, 2023
d9a873d
Merge branch 'bayesian_before_and_after_merge' of github.com:EmaDulj/…
EmaDulj Dec 14, 2023
42fa05f
Merge pull request #9 from EmaDulj/bayesian_before_and_after_merge
EmaDulj Dec 14, 2023
ddaa0dd
Added Expected Improvement
EmaDulj Dec 14, 2023
7b605bd
Updated acquisition functions
EmaDulj Dec 14, 2023
afb2105
Added Project Infographics
EmaDulj Dec 14, 2023
f9699f0
Rename Project Infographics.png to ProjectInfographics.png
EmaDulj Dec 14, 2023
24faaf5
Update README.md
EmaDulj Dec 14, 2023
13554df
Merge pull request #12 from EmaDulj/bayesian_before_and_after_merge
EmaDulj Dec 14, 2023
71302c3
Added probability of improvement
EmaDulj Dec 15, 2023
020e1e6
Added realizations
EmaDulj Dec 15, 2023
4249a31
Added realizations to config
EmaDulj Dec 15, 2023
61e3178
Added realizations to config
EmaDulj Dec 15, 2023
b622c86
Added realizations to config
EmaDulj Dec 15, 2023
5a2c84c
Added realizations to config
EmaDulj Dec 15, 2023
3eb83b6
Added realizations to config
EmaDulj Dec 15, 2023
4322b32
Added realizations to config
EmaDulj Dec 15, 2023
a22b67f
Added realizations to config
EmaDulj Dec 15, 2023
664c054
Added realizations to config
EmaDulj Dec 15, 2023
9cefae4
Added realizations to config
EmaDulj Dec 15, 2023
d56a934
Added realizations to config
EmaDulj Dec 15, 2023
18abf16
Added realizations to config
EmaDulj Dec 15, 2023
d72149e
Added realizations to config
EmaDulj Dec 15, 2023
8b2a927
Added realizations to config
EmaDulj Dec 15, 2023
0f3c95e
Merge pull request #18 from EmaDulj/bayesian_before_and_after_merge
EmaDulj Dec 15, 2023
c6589f5
Update README.md
EmaDulj Dec 15, 2023
a551f15
Merge pull request #3 from DhanushkiMapitigama/bayesian_before_and_af…
EmaDulj Dec 15, 2023
02dbf57
Merge pull request #20 from EmaDulj/bayesian_before_and_after_merge
EmaDulj Dec 15, 2023
567bd1e
Fixed typos
EmaDulj Dec 27, 2023
f963fab
Merge pull request #27 from EmaDulj/bayesian_before_and_after_merge
EmaDulj Dec 29, 2023
443cd46
Create data
EmaDulj Jan 8, 2024
9e7a666
Delete experiments/data
EmaDulj Jan 8, 2024
920bd77
File for getting stats from json files
EmaDulj Jan 8, 2024
21ea608
Create he
EmaDulj Jan 8, 2024
3fe2deb
Adding results for different configs
EmaDulj Jan 8, 2024
99b207a
Delete experiments/data/he
EmaDulj Jan 8, 2024
bbde401
Merge pull request #31 from EmaDulj/bayesian_before_and_after_merge
EmaDulj Jan 8, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
Reservoir
RayLogs
__pycache__
*pyc
*egg-info
.DS_Store
.DS_Store
31 changes: 11 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,20 @@
# RECOVER: sequential model optimization platform for combination drug repurposing identifies novel synergistic compounds *in vitro*
# Machine Learning Driven Candidate Compound Generation for Drug Repurposing
Based on RECOVER: sequential model optimization platform for combination drug repurposing identifies novel synergistic compounds *in vitro*
[![DOI](https://zenodo.org/badge/320327566.svg)](https://zenodo.org/badge/latestdoi/320327566)

RECOVER is a platform that can guide wet lab experiments to quickly discover synergistic drug combinations active
against a cancer cell line, requiring substantially less screening than an exhaustive evaluation
([preprint](https://arxiv.org/abs/2202.04202)).
This repository is an implementation of RECOVER, a platform that can guide wet lab experiments to quickly discover synergistic drug combinations,
([preprint](https://arxiv.org/abs/2202.04202)), howerver instead of using an ensemble model to get Synergy predictions with uncertainty, we used multiple realization of a Bayesian Neural Network model.
Since the weights are drawn from a distribution, they differ for every run of a trained model and hence give different results. The goal was to get a more precise uncertainty and achieve i quicker since the model doesn't have to be trained multiple times.


![Overview](docs/images/overview.png "Overview")
## Bayesian Before and After Merge
This branch is refering to a model using Bayesian modeling in both single drug MLP and Combination MLP. The predictors with all Bayesian layers are added in `Recover/recover/models/predictors.py`. The `train.py` was updated with a train_epoch_bayesian function that trains the model using KL loss and test_epoch for testing of the model. In Bayesian Basic Trainer test_epoch is used to test the trained model and easily get the mean and the standard deviation of synergy predictions.
In Bayesian Active Trainer, realizations of the trained model are used instead of the Ensemble Model to get the acqusition function scores and rank the drug combinations. Probability of Improvement and Expected Improvement acquistion functions were added to `Recover/recover/acquisition/acquisition.py` since we are now working with Bayesian Optimization.
Config files are also updated to be use BNNs. In this branch there are separate bayesian config files, while in the **master** use of bayesian layers feature was added to existing config files.

## Environment setup

**Requirements**: Anaconda (https://www.anaconda.com/) and Git LFS (https://git-lfs.github.com/). Please make sure
both are installed on the system prior to running installation.

**Installation**: enter the command `source install.sh` and follow the instructions. This will create a conda
environment named **recover** and install all the required packages including the
[reservoir](https://github.com/RECOVERcoalition/Reservoir) package that stores the primary data acquisition scripts.

In case you have any issue with the installation procedure of the *reservoir*, you can access and download all files directly from this [google drive](https://drive.google.com/drive/folders/1MYeDoAi0-qnhSJTvs68r861iMOdoqYki?usp=share_link).
**Requirements and Installation**:
For all the requirements and installation steps check th orginal RECOVER repository (https://github.com/RECOVERcoalition/Recover.git).

## Running the pipeline

Expand All @@ -31,9 +28,3 @@ For example, to run the pipeline with configuration from
the file `model_evaluation.py`, run `python train.py --config model_evaluation`.

Log files will automatically be created to save the results of the experiments.

## Note

This Recover repository is based on research funded by (or in part by) the Bill & Melinda Gates Foundation. The
findings and conclusions contained within are those of the authors and do not necessarily reflect positions or policies
of the Bill & Melinda Gates Foundation.
Binary file added docs/images/ProjectInfographics.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
46 changes: 46 additions & 0 deletions experiments/data/cell_line_transfer_bayesian-result.json

Large diffs are not rendered by default.

Large diffs are not rendered by default.

36 changes: 36 additions & 0 deletions experiments/data/cell_line_transfer_shuffled_bayesian-result.json

Large diffs are not rendered by default.

49 changes: 49 additions & 0 deletions experiments/data/model_evaluation_bayesian-result.json

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

72 changes: 72 additions & 0 deletions experiments/data/model_evaluation_shuffled_bayesian-result.json

Large diffs are not rendered by default.

25 changes: 25 additions & 0 deletions experiments/data/one_hidden_drug_split_bayesian-result.json

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

39 changes: 39 additions & 0 deletions experiments/stats.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
import json
import numpy as np

# Specify the path to your JSON file
json_file_path = "result.json"

# Load data from the JSON file
with open(json_file_path, "r") as json_file:
# Wrap the content in square brackets to form a JSON array
json_data = "[" + json_file.read().replace("}\n{", "},\n{") + "]"

# Parse the JSON array
data = json.loads(json_data)

# Extract relevant values
spearman_values = [entry["eval/spearman"] for entry in data]
rsquared_values = [entry["eval/comb_r_squared"] for entry in data]

# Calculate mean and standard deviation
mean_spearman = np.mean(spearman_values)
std_dev_spearman = np.std(spearman_values)

mean_rsquared = np.mean(rsquared_values)
std_dev_rsquared = np.std(rsquared_values)

# Print the results
print(f"Mean eval/rsquared: {round(mean_rsquared, 3)}, Standard Deviation: {round(std_dev_rsquared, 3)}")
print(f"Mean eval/spearman: {round(mean_spearman, 3)}, Standard Deviation: {round(std_dev_spearman, 3)}")

last_entry = data[-1]
mean_r_squared_last = last_entry["mean_r_squared"]
std_r_squared_last = last_entry["std_r_squared"]

mean_spearman_last = last_entry["mean_spearman"]
std_spearman_last = last_entry["std_spearman"]

# Print values for the last entry
print(f"Mean test/rsquared: {round(mean_r_squared_last, 3)}, Standard Deviation: {round(std_r_squared_last, 3)}")
print(f"Mean test/spearman: {round(mean_spearman_last, 3)}, Standard Deviation: {round(std_spearman_last, 3)}")
48 changes: 47 additions & 1 deletion recover/acquisition/acquisition.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,7 @@
import torch
import numpy as np
from scipy.special import erf
from scipy.stats import norm

########################################################################################################################
# Abstract Acquisition
Expand All @@ -18,23 +21,66 @@ def get_scores(self, output):
raise NotImplementedError

def get_mean_and_std(self, output):
output = torch.tensor(output)
mean = output.mean(dim=1)
std = output.std(dim=1)

return mean, std

"""
The max synergy is considered as current best since we don't have access to ground truth
"""
def get_current_best(self, output):
best, _ = output.max(dim=1)


return best


########################################################################################################################
# Acquisition functions
########################################################################################################################

class ExpectedImprovementAcquisition(AbstractAcquisition):
def __init__(self, config):
super().__init__(config)

def get_scores(self, output):
mean, std = self.get_mean_and_std(output)
best = self.get_current_best(output)
epsilon = 1e-6

z = (mean-best-epsilon)/(std+epsilon)
phi = np.exp(-0.5*(z**2))/np.sqrt(2*np.pi)
Phi = 0.5*(1+erf(z/np.sqrt(2)))
scores = (mean-best)*Phi+std*phi

return scores.to("cpu")


class RandomAcquisition(AbstractAcquisition):
def __init__(self, config):
super().__init__(config)

def get_scores(self, output):
return torch.randn(output.shape[0])


class ProbabilityOfImprovementAcquisition(AbstractAcquisition):
"""
Probability of Improvement Aquisition Function

"""
def __init__(self, config):
super().__init__(config)

def get_scores(self, output):
mean, std = self.get_mean_and_std(output)
current_best = self.get_current_best(output)

z = (mean - current_best) / std
prob_of_improvement_scores = norm.cdf(z)

return torch.tensor(prob_of_improvement_scores).to("cpu")


class UCB(AbstractAcquisition):
Expand Down
112 changes: 112 additions & 0 deletions recover/config/active_learning_UCB_bayesian.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
from recover.datasets.drugcomb_matrix_data import DrugCombMatrix
from recover.models.models import Baseline, EnsembleModel
from recover.models.predictors import BilinearFilmMLPPredictor, \
BilinearMLPPredictor, BilinearFilmWithFeatMLPPredictor, BayesianBilinearMLPPredictor #, BilinearCellLineInputMLPPredictor
from recover.utils.utils import get_project_root
from recover.acquisition.acquisition import RandomAcquisition, GreedyAcquisition, UCB, ExpectedImprovementAcquisition
from recover.train import train_epoch_bayesian, eval_epoch, test_epoch, BayesianBasicTrainer, BayesianActiveTrainer
import os
from ray import tune

########################################################################################################################
# Configuration
########################################################################################################################


pipeline_config = {
"use_tune": True,
"num_epoch_without_tune": 500, # Used only if "use_tune" == False
"seed": tune.grid_search([1, 2, 3]),
# Optimizer config
"lr": 1e-4,
"weight_decay": 1e-2,
"batch_size": 128,
# Train epoch and eval_epoch to use
"train_epoch": train_epoch_bayesian,
"eval_epoch": eval_epoch,
"test_epoch": test_epoch,
}

predictor_config = {
"predictor": BayesianBilinearMLPPredictor,
"predictor_layers":
[
2048,
128,
64,
1,
],
"merge_n_layers_before_the_end": 2, # Computation on the sum of the two drug embeddings for the last n layers
"allow_neg_eigval": True,
"stop": {"training_iteration": 1000, 'patience': 10}
}

model_config = {
"model": Baseline,
# Loading pretrained model
"load_model_weights": False, # tune.grid_search([True, False]),
"model_weights_file": "",
}

"""
List of cell line names:

['786-0', 'A498', 'A549', 'ACHN', 'BT-549', 'CAKI-1', 'EKVX', 'HCT-15', 'HCT116', 'HOP-62', 'HOP-92', 'HS 578T', 'HT29',
'IGROV1', 'K-562', 'KM12', 'LOX IMVI', 'MALME-3M', 'MCF7', 'MDA-MB-231', 'MDA-MB-468', 'NCI-H226', 'NCI-H460',
'NCI-H522', 'NCIH23', 'OVCAR-4', 'OVCAR-5', 'OVCAR-8', 'OVCAR3', 'PC-3', 'RPMI-8226', 'SF-268', 'SF-295', 'SF-539',
'SK-MEL-2', 'SK-MEL-28', 'SK-MEL-5', 'SK-OV-3', 'SNB-75', 'SR', 'SW-620', 'T-47D', 'U251', 'UACC-257', 'UACC62',
'UO-31']
"""

dataset_config = {
"dataset": DrugCombMatrix,
"study_name": 'ALMANAC',
"in_house_data": 'without',
"rounds_to_include": [],
"cell_line": 'MCF7', # Restrict to a specific cell line
"val_set_prop": 0.1,
"test_set_prop": 0.,
"test_on_unseen_cell_line": False,
"split_valid_train": "pair_level", # either "cell_line_level" or "pair_level"
"cell_lines_in_test": None, # ['MCF7', 'PC-3'],
"target": "bliss_max",
"fp_bits": 1024,
"fp_radius": 2
}

active_learning_config = {
"ensemble_size": 10,
"acquisition": tune.grid_search([GreedyAcquisition, UCB, RandomAcquisition, ExpectedImprovementAcquisition]),
"patience_max": 8,
"kappa": 1,
"kappa_decrease_factor": 1,
"n_epoch_between_queries": 500,
"acquire_n_at_a_time": 30,
"n_initial": 30,
"realizations": 10, #define the number of realizations instead of Ensemble Model
}

########################################################################################################################
# Configuration that will be loaded
########################################################################################################################

configuration = {
"trainer": BayesianActiveTrainer, # PUT NUM GPU BACK TO 1
"trainer_config": {
**pipeline_config,
**predictor_config,
**model_config,
**dataset_config,
**active_learning_config
},
"summaries_dir": os.path.join(get_project_root(), "RayLogs"),
"memory": 1800,
"stop": {"training_iteration": 1000, 'all_space_explored': 1},
"checkpoint_score_attr": 'eval/comb_r_squared',
"keep_checkpoints_num": 1,
"checkpoint_at_end": False,
"checkpoint_freq": 1,
"resources_per_trial": {"cpu": 32, "gpu": 2},
"scheduler": None,
"search_alg": None,
}
87 changes: 87 additions & 0 deletions recover/config/cell_line_transfer_bayesian.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
from recover.datasets.drugcomb_matrix_data import DrugCombMatrix
from recover.models.models import Baseline
from recover.models.predictors import BayesianBilinearFilmMLPPredictor, BayesianBilinearLinFilmWithFeatMLPPredictor #adding Bayesian predictor
from recover.utils.utils import get_project_root
from recover.train import train_epoch_bayesian, eval_epoch, test_epoch, BayesianBasicTrainer #adding Bayesian training and trainer and including a tesing epoch
import os
from ray import tune
from importlib import import_module

########################################################################################################################
# Configuration
########################################################################################################################


pipeline_config = {
"use_tune": True,
"num_epoch_without_tune": 500, # Used only if "use_tune" == False
"seed": tune.grid_search([2, 3, 4]),
# Optimizer config
"lr": 1e-4,
"weight_decay": 1e-2,
"batch_size": 128,
# Train epoch and eval_epoch to use
"train_epoch": train_epoch_bayesian, #updated train function, where KL divergence is used
"eval_epoch": eval_epoch,
"test_epoch": test_epoch, #added a Bayesian test epoch, used for differebt realizations
}

predictor_config = {
"predictor": BayesianBilinearLinFilmWithFeatMLPPredictor,
"predictor_layers":
[
2048,
128,
64,
1,
],
"merge_n_layers_before_the_end": 2, # Computation on the sum of the two drug embeddings for the last n layers
"allow_neg_eigval": True,
"stop": {"training_iteration": 1000, 'patience': 10}, #in order to check when the training in over, we parse these arguments
"realizations": 10 #define the number of realizations
}

model_config = {
"model": Baseline,
"load_model_weights": False,
}

dataset_config = {
"dataset": DrugCombMatrix,
"study_name": 'ALMANAC',
"in_house_data": 'without',
"rounds_to_include": [],
"val_set_prop": 0.2,
"test_set_prop": 0.1,
"test_on_unseen_cell_line": True,
"cell_lines_in_test": ['MCF7'],
"split_valid_train": "cell_line_level",
"cell_line": None, # 'PC-3',
"target": "bliss_max", # tune.grid_search(["css", "bliss", "zip", "loewe", "hsa"]),
"fp_bits": 1024,
"fp_radius": 2
}

########################################################################################################################
# Configuration that will be loaded
########################################################################################################################

configuration = {
"trainer": BayesianBasicTrainer, #Adding Bayesian trainer
"trainer_config": {
**pipeline_config,
**predictor_config,
**model_config,
**dataset_config,
},
"summaries_dir": os.path.join(get_project_root(), "RayLogs"),
"memory": 1800,
"stop": {"training_iteration": 1000, 'patience': 10},
"checkpoint_score_attr": 'eval/comb_r_squared',
"keep_checkpoints_num": 1,
"checkpoint_at_end": False,
"checkpoint_freq": 1,
"resources_per_trial": {"cpu": 8, "gpu": 0},
"scheduler": None,
"search_alg": None,
}
Loading