Cloning the Repository

Clone the repository using git:

git clone https://github.com/HenryTeahan/CatSperML

Environment Setup

Change directory to the CatSperML folder structure:

cd CatSperML

Install the required dependencies.
You can set up the environment using Conda.

Create the environment from the provided environment.yml file:

conda env create -f environment.yml
conda activate catsper-freerelease

How to use the model (short version)

cd src
python align.py # aligns training molecules and screening molecules
python optimization.py # performs optimization
python train.py --load_opt_params # trains on aligned training molecules using optimized parameters
python screen.py # Automatically uses screening file assuming previous step complete. --sdf_t (YOUR ALIGNED TRAINING SDF FILE) --sdf_s (YOUR ALIGNED SCREENING SDF FILE)

Go and inspect your results in results/screening/processing_hits.ipynb!

How to use the model (long version)?

This guide explains how to run a demonstration of the model. At the moment, the original dataset is proprietary.

The file data/toy_indoles.sdf contains randomly generated substituted indoles, produced in random_indoles.ipynb.
A random selection of 10 of these indoles are given an active label, and any molecules in the dataset with a Tanimoto similarity > 0.7 (using ECFP4) are also labeled as active. This gives the model something to work on :-).

The result is an .sdf file, toy_indoles.sdf, where each molecule has the activity defined in its property interface (mol.GetProp("IC50"))

Workflow

As mentioned in the paper, this model relies on the 2D alignment of the input molecules. For this, an alignment protocol has been developed.

Align molecules

python align.py --sdf_t "name.sdf" --sdf_s "screen.sdf" # Uses toy_indoles.sdf and HIT_locator.sdf as default.

This creates a new file for each input: name_aligned.sdf and screen_aligned.sdf In this sdf, the indoles are aligned using the alignment protocol described in the paper (reference mols found in /data/processed/references.sdf). The molecules to which they are being aligned are therefore not purpose built for this simulated data, which may explain some discrepancies in the final results.

Optimize hyperparameters

python parameter_optimization.py --sdf "name_aligned.sdf" (uses the toy_indoles_aligned.sdf by default)

You can modify the hyperparameter optimization ranges and the scale of the optimization (Uses a TPEsampler).
For more information, run:

python parameter_optimization.py -h

Train the model

python train.py --sdf "path_to_toy_indoles_file_aligned.sdf" --load_opt_params

Note: You must include --load_opt_params if you want to use the optimized hyperparameters from step 2. The default .sdf file is toy_indoles_aligned.sdf.

If you want to see the training molecules with their explainable 2D maps and decision points, as well as the decision tree and centroid map, then add --save_img to the command. This saves the images in the results folder.

Screen the library

python screen.py --sdf_t "path_to_toy_indoles_file_aligned.sdf" --sdf_s "path_to_screening_library"

Note: This automatically uses the same hyperparameters as used during training. Use the printout X_train CHECK to verify this. This can be seen as a checksum for the descriptor generation which must be consistent across training and screening. Default paths: --sdf_t: points to toy_indoles_aligned.sdf --sdf_s: points to HIT_locator_aligned.sdf (subset of enamines HIT locator library)

Success! Once the screening completes, view and analyze your hits using: processing_hits.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
results		results
src		src
.gitignore		.gitignore
environment.yml		environment.yml
readme.MD		readme.MD
repo.tree		repo.tree
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cloning the Repository

Environment Setup

How to use the model (short version)

How to use the model (long version)?

Workflow

About

Uh oh!

Releases

Packages

Languages

HenryTeahan/CatSperML

Folders and files

Latest commit

History

Repository files navigation

Cloning the Repository

Environment Setup

How to use the model (short version)

How to use the model (long version)?

Workflow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages