Skip to content

Public code for the Explainable pharmacophore model developed and used in the discovery of CatSper Inhibitors

Notifications You must be signed in to change notification settings

HenryTeahan/CatSperML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cloning the Repository

Clone the repository using git:

git clone https://github.com/HenryTeahan/CatSperML

Environment Setup

Change directory to the CatSperML folder structure:

cd CatSperML

Install the required dependencies.
You can set up the environment using Conda.

Create the environment from the provided environment.yml file:

conda env create -f environment.yml
conda activate catsper-freerelease

How to use the model (short version)

cd src
python align.py # aligns training molecules and screening molecules
python optimization.py # performs optimization
python train.py --load_opt_params # trains on aligned training molecules using optimized parameters
python screen.py # Automatically uses screening file assuming previous step complete. --sdf_t (YOUR ALIGNED TRAINING SDF FILE) --sdf_s (YOUR ALIGNED SCREENING SDF FILE) 

Go and inspect your results in results/screening/processing_hits.ipynb!

How to use the model (long version)?

This guide explains how to run a demonstration of the model. At the moment, the original dataset is proprietary.

The file data/toy_indoles.sdf contains randomly generated substituted indoles, produced in random_indoles.ipynb.
A random selection of 10 of these indoles are given an active label, and any molecules in the dataset with a Tanimoto similarity > 0.7 (using ECFP4) are also labeled as active. This gives the model something to work on :-).

The result is an .sdf file, toy_indoles.sdf, where each molecule has the activity defined in its property interface (mol.GetProp("IC50"))

Workflow

As mentioned in the paper, this model relies on the 2D alignment of the input molecules. For this, an alignment protocol has been developed.

  1. Align molecules
python align.py --sdf_t "name.sdf" --sdf_s "screen.sdf" # Uses toy_indoles.sdf and HIT_locator.sdf as default.

This creates a new file for each input: name_aligned.sdf and screen_aligned.sdf In this sdf, the indoles are aligned using the alignment protocol described in the paper (reference mols found in /data/processed/references.sdf). The molecules to which they are being aligned are therefore not purpose built for this simulated data, which may explain some discrepancies in the final results.

  1. Optimize hyperparameters
python parameter_optimization.py --sdf "name_aligned.sdf" (uses the toy_indoles_aligned.sdf by default)

You can modify the hyperparameter optimization ranges and the scale of the optimization (Uses a TPEsampler).
For more information, run:

python parameter_optimization.py -h
  1. Train the model
python train.py --sdf "path_to_toy_indoles_file_aligned.sdf" --load_opt_params

Note: You must include --load_opt_params if you want to use the optimized hyperparameters from step 2. The default .sdf file is toy_indoles_aligned.sdf.

If you want to see the training molecules with their explainable 2D maps and decision points, as well as the decision tree and centroid map, then add --save_img to the command. This saves the images in the results folder.

  1. Screen the library
python screen.py --sdf_t "path_to_toy_indoles_file_aligned.sdf" --sdf_s "path_to_screening_library"

Note: This automatically uses the same hyperparameters as used during training. Use the printout X_train CHECK to verify this. This can be seen as a checksum for the descriptor generation which must be consistent across training and screening. Default paths: --sdf_t: points to toy_indoles_aligned.sdf --sdf_s: points to HIT_locator_aligned.sdf (subset of enamines HIT locator library)

  1. Success! Once the screening completes, view and analyze your hits using: processing_hits.ipynb

About

Public code for the Explainable pharmacophore model developed and used in the discovery of CatSper Inhibitors

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published