Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
ce8f692
changes done during setup
DhanushkiMapitigama Dec 11, 2023
e78226e
Merge pull request #1 from DhanushkiMapitigama/setup-changes
DhanushkiMapitigama Dec 11, 2023
6ec9c0f
Adding project infographics
EmaDulj Dec 11, 2023
eeef07d
Merge pull request #2 from EmaDulj/master
EmaDulj Dec 11, 2023
6eee1db
Update README.md
EmaDulj Dec 11, 2023
6c7fe27
Update README.md
EmaDulj Dec 11, 2023
03351ae
Delete docs/images/Project Infographics.jpg
EmaDulj Dec 11, 2023
80364c7
Adding infographics
EmaDulj Dec 11, 2023
0ed3b54
Rename Project Infographics.png to ProjectInfographics.png
EmaDulj Dec 11, 2023
897e12c
Update README.md
EmaDulj Dec 11, 2023
346e4ee
Merge pull request #3 from EmaDulj/master
EmaDulj Dec 11, 2023
050cec6
Update README.md
EmaDulj Dec 11, 2023
752823d
Merge pull request #4 from EmaDulj/master
EmaDulj Dec 11, 2023
0ed04bd
Bayesian methods implemented
DhanushkiMapitigama Dec 12, 2023
946cf64
Bayesian config files added
DhanushkiMapitigama Dec 12, 2023
d5f6482
Bayesian config files
DhanushkiMapitigama Dec 12, 2023
35dc8b7
Merge pull request #6 from DhanushkiMapitigama/new_configs
DhanushkiMapitigama Dec 12, 2023
c2f6eb5
Added optimized kl weight
EmaDulj Dec 14, 2023
47f857f
Added Expected Improvement
EmaDulj Dec 14, 2023
3ec6064
Merge pull request #13 from EmaDulj/bayesian_after_merge
EmaDulj Dec 14, 2023
58d38be
Automated the switch between default and bayesian modelling through c…
DhanushkiMapitigama Dec 14, 2023
234d907
Added bayesian parameters to configs
DhanushkiMapitigama Dec 14, 2023
ebeacc8
Merge pull request #15 from DhanushkiMapitigama/automating_bayesian_m…
DhanushkiMapitigama Dec 14, 2023
242bddc
Probability of improvement aquisition added
DhanushkiMapitigama Dec 14, 2023
46a65d1
Active learning trainer and configs modified
DhanushkiMapitigama Dec 14, 2023
8671f89
sigmoid added
DhanushkiMapitigama Dec 14, 2023
d7a2346
Merge pull request #16 from DhanushkiMapitigama/PI_aquisition
DhanushkiMapitigama Dec 14, 2023
707a186
Model Overview
EmaDulj Dec 15, 2023
974cf3f
Update README.md
EmaDulj Dec 15, 2023
e7e8429
Update README.md
EmaDulj Dec 15, 2023
3d07745
Update README.md
EmaDulj Dec 15, 2023
ea2d994
Create b
EmaDulj Dec 15, 2023
ad96008
Added experiments
EmaDulj Dec 15, 2023
e7e1580
Delete experiments/b
EmaDulj Dec 15, 2023
343cefa
Merge branch 'DhanushkiMapitigama:master' into master
EmaDulj Dec 15, 2023
ab42b7e
Merge pull request #19 from EmaDulj/master
EmaDulj Dec 15, 2023
4569d7c
Update requirements.txt
DhanushkiMapitigama Dec 15, 2023
dddfac3
Update train.py
DhanushkiMapitigama Dec 15, 2023
ff117de
Implementation of manual test for permuted drug combinations
DhanushkiMapitigama Dec 17, 2023
8c28440
Final sigmoid layer enabling added to config
DhanushkiMapitigama Dec 17, 2023
54c90d6
Config changes for sigmoid and other minor changes
DhanushkiMapitigama Dec 17, 2023
6f51fdf
Merge pull request #23 from DhanushkiMapitigama/permute_invariance_ex…
DhanushkiMapitigama Dec 17, 2023
534c8f0
JSON result files added
DhanushkiMapitigama Dec 17, 2023
398e6ea
Result plots and notebook added
DhanushkiMapitigama Dec 17, 2023
0d86f80
Merge pull request #24 from DhanushkiMapitigama/result_script
DhanushkiMapitigama Dec 17, 2023
a87eb0a
Changed the bayesian trainer to add noise as defined
DhanushkiMapitigama Dec 18, 2023
a08e910
Config files changed
DhanushkiMapitigama Dec 18, 2023
3bfd12a
Noise functions added
DhanushkiMapitigama Dec 18, 2023
f0d8a91
Job script for parallel execution on cluster
DhanushkiMapitigama Dec 18, 2023
d4d4483
Merge pull request #25 from DhanushkiMapitigama/job_script
DhanushkiMapitigama Dec 18, 2023
68feca6
Update README.md
DhanushkiMapitigama Dec 18, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
Reservoir
RayLogs
__pycache__
*pyc
*egg-info
.DS_Store
.DS_Store
44 changes: 27 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,39 @@
# RECOVER: sequential model optimization platform for combination drug repurposing identifies novel synergistic compounds *in vitro*
# Machine Learning Driven Candidate Compound Generation for Drug Repurposing
Based on RECOVER: sequential model optimization platform for combination drug repurposing identifies novel synergistic compounds *in vitro*
[![DOI](https://zenodo.org/badge/320327566.svg)](https://zenodo.org/badge/latestdoi/320327566)

RECOVER is a platform that can guide wet lab experiments to quickly discover synergistic drug combinations active
against a cancer cell line, requiring substantially less screening than an exhaustive evaluation
([preprint](https://arxiv.org/abs/2202.04202)).
This repository is an implementation of RECOVER, a platform that can guide wet lab experiments to quickly discover synergistic drug combinations,
([preprint](https://arxiv.org/abs/2202.04202)), howerver instead of using an ensemble model to get Synergy predictions with uncertainty, we used multiple realization of a Bayesian Neural Network model.
Since the weights are drawn from a distribution, they differ for every run of a trained model and hence give different results. The goal was to get a more precise uncertainty and achieve i quicker since the model doesn't have to be trained multiple times.

<p float="left">
<img src="docs/images/ProjectInfographics.png" alt="Overview" width="300"/>
<img src="docs/images/ModelOverview.png" alt="Model Overview" width="400"/>
</p>

![Overview](docs/images/overview.png "Overview")
## Repository overview
There are 4 branches in this Git Repository.

**Master**, where the original RECOVER is stored with Bayesian setup but original config files, so the initial pipeline can be easily recreated.
**Bayesian_after_merge**, with a Bayesian model used only in the layers after the bilinear merge.
**Bayesian_before_and_after_merge**, with a Bayesian model used in the layers before and after the bilinear merge.
**Weight_uncertainty**, a Bayesian Neural Network library implemented from scratch and two priors introduced in order to optimize the model. (https://doi.org/10.48550/arXiv.1505.05424)

Every branch will be further explaned in its own README.

## Environment setup

**Requirements**: Anaconda (https://www.anaconda.com/) and Git LFS (https://git-lfs.github.com/). Please make sure
both are installed on the system prior to running installation.
**Requirements and Installation**:
For all the requirements and installation steps dataset and the project check the orginal RECOVER repository (https://github.com/RECOVERcoalition/Recover.git).

**Installation**: enter the command `source install.sh` and follow the instructions. This will create a conda
environment named **recover** and install all the required packages including the
[reservoir](https://github.com/RECOVERcoalition/Reservoir) package that stores the primary data acquisition scripts.
**Potential issues and fixes**
- Use the requirements file from this Repository
- Cloning dataset: Makesure to install Git LFS (Since there are some files over 100 MB git-lfs is needed for cloning.)
- Numpy 1.24 does not support - Even though there was no mention about the numpy version in original RECOVER requirements there are some compatibility issues when using other libraries therefore we have to specifically downgrade numpy to make it working without any issues. Try, pip install numpy==1.22.4
- Ray tune issue: Use the updated train.py within this repo to avoid ray tune path issues when saving checkpoints
- Issues with the config files: Check for such unused modules which are imported and remove them to avoid module not fount errors.
- rdkit installation path in original Repository is outdated. Try, conda install -c conda-forge rdkit

In case you have any issue with the installation procedure of the *reservoir*, you can access and download all files directly from this [google drive](https://drive.google.com/drive/folders/1MYeDoAi0-qnhSJTvs68r861iMOdoqYki?usp=share_link).

## Running the pipeline

Expand All @@ -31,9 +47,3 @@ For example, to run the pipeline with configuration from
the file `model_evaluation.py`, run `python train.py --config model_evaluation`.

Log files will automatically be created to save the results of the experiments.

## Note

This Recover repository is based on research funded by (or in part by) the Bill & Melinda Gates Foundation. The
findings and conclusions contained within are those of the authors and do not necessarily reflect positions or policies
of the Bill & Melinda Gates Foundation.
76 changes: 76 additions & 0 deletions cluster_job.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
#!/bin/bash -l
# shellcheck disable=SC2206

#SBATCH -A ADD_THE_PROJECT_NAME_HERE # project name
#SBATCH -M ADD_THE_SYSTEM_NAME_HERE # name of system i.e. snowy, dardel

#SBATCH -p node # request a full node
#SBATCH -N 2 # Number of nodes Change --num-gpus "2" in head command and worker loop as well
#SBATCH -t 0:15:00 # change time accordingly
#SBATCH --gpus-per-node=2 # change gpus accordingly
#SBATCH -J exp-seed-3 # name of the job
#SBATCH -D ./ # stay in current working directory


source ~/.bashrc
conda activate recover

set -x

# __doc_head_address_start__

# Getting the node names
nodes=$(scontrol show hostnames "$SLURM_JOB_NODELIST")
nodes_array=($nodes)

head_node=${nodes_array[0]}
head_node_ip=$(srun --nodes=1 --ntasks=1 -w "$head_node" hostname --ip-address)

# if we detect a space character in the head node IP, we'll
# convert it to an ipv4 address. This step is optional.
if [[ "$head_node_ip" == *" "* ]]; then
IFS=' ' read -ra ADDR <<<"$head_node_ip"
if [[ ${#ADDR[0]} -gt 16 ]]; then
head_node_ip=${ADDR[1]}
else
head_node_ip=${ADDR[0]}
fi
echo "IPV6 address detected. We split the IPV4 address as $head_node_ip"
fi
# __doc_head_address_end__


# __doc_head_ray_start__
port=6379
ip_head=$head_node_ip:$port
export ip_head
echo "IP Head: $ip_head"

echo "Starting HEAD at $head_node"
srun --nodes=1 --ntasks=1 -w "$head_node" \
ray start --head --node-ip-address="$head_node_ip" --port=$port \
--num-cpus "16" --num-gpus "2" --block &
# __doc_head_ray_end__


# __doc_worker_ray_start__
# optional, though may be useful in certain versions of Ray < 1.0.
sleep 10

# number of nodes other than the head node
worker_num=$((SLURM_JOB_NUM_NODES - 1))

for ((i = 1; i <= worker_num; i++)); do
node_i=${nodes_array[$i]}
echo "Starting WORKER $i at $node_i"
srun --nodes=1 --ntasks=1 -w "$node_i" \
ray start --address "$ip_head" \
--num-cpus "16" --num-gpus "2" --block &
sleep 5
done
# __doc_worker_ray_end__


# __doc_script_start__
#Active learning with Upper Confidence Bound Aquisition
python train.py --config active_learning_UCB
Binary file added docs/images/ModelOverview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/ProjectInfographics.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading