GitHub - fairydance/DeepRLI: A Multi-objective Framework for Universal Protein–Ligand Interaction Prediction

DeepRLI

A Multi-objective Framework for Universal Protein–Ligand Interaction Prediction
Article · GitHub · Zenodo

Training Datasets · Trained Models

Table of Contents

About The Project
Getting Started
- Prerequisites
- Installation
Usage
License
Contact
Acknowledgments

About The Project

In this study, we propose and implement DeepRLI, an interaction prediction framework that is universally applicable across various tasks, leveraging a multi-objective strategy. Innovatively, this work proposes a multi-objective learning strategy that includes scoring, docking, and screening as optimization goals. This allows the deep learning model to have three relatively independent downstream readout networks, which can be optimized separately to enhance the task specificity of each output. The model incorporates an improved graph transformer with a cosine envelope constraint, integrates a novel physical information module, and introduces a new contrastive learning strategy. With these designs, DeepRLI demonstrates superior comprehensive performance, accommodating applications such as binding affinity prediction, binding pose prediction, and virtual screening, showcasing its potential in practical drug development.

The architecture of DeepRLI is illustrated in Figure 1.

Figure 1. Schematic representation of the DeepRLI architecture

(back to top)

Getting Started

Prerequisites

Python virtual environment needs to be created in advance.

Import from the .yml file

conda env create -n deeprli -f environment.yml

Create step by step

conda create -n deeprli python=3.11
conda activate deeprli
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia
conda install -c dglteam/label/cu118 dgl==1.1.2.cu118
conda install -c conda-forge rdkit==2023.09.2

Installation

Clone the repository

git clone https://github.com/fairydance/DeepRLI.git

Set the environment variable

export PYTHONPATH="${REPO_ROOT}/src:${PYTHONPATH}"

(back to top)

Usage

Preprocessing

Whether it is training or inference, structured data needs to be preprocessed into graph data. Executing preprocessing task first requires building a preset directory structure as follows:

${DATA_ROOT_DIR}
├── index
└── raw

The directory raw should contain the structure files of ligands and proteins. And the index directory provides index files for the data to be processed. The examples/data directory in this repository provides data examples for reference.

The script for this job is deposited in the ${REPO_ROOT}/src/deeprli/preprocess directory. Run it as below:

python preprocess.py\
  --data-root "${DATA_ROOT_DIR}"\
  --data-index "${DATA_INDEX_FILE}"\
  --ligand-file-types "sdf,mol2"\
  --dist-cutoff 6.5

The path of ${DATA_INDEX_FILE} in the above command is relative to the ${DATA_ROOT_DIR}. After execution, each complex data after processing will be stored in the ${DATA_ROOT_DIR}/processed location, and finally a file containing all processed complex data will be packaged and saved in the ${DATA_ROOT_DIR}/compiled directory. The "data_file" for training and inference input is the packaged file. In addition, this script will also output an index file at the end that stores the successfully processed data.

It should be noted that the ${DATA_INDEX_FILE} can only contain the processed data for the subsequent training or inference task. If there exist items reside in the ${DATA_INDEX_FILE} but cannot be found in the ${DATA_ROOT_DIR}/processed folder, the data preprocessing will be re-executed.

Training

The script for model training is in the ${REPO_ROOT}/src/deeprli/train directory. It can not only provide the necessary input with command line parameters, but also obtain the corresponding input by reading the json-formatted configuration file, and the former has a higher priority.

python train.py --config "${CONFIG_FILE}"

An example of the configuration file is as follows:

{
  "train_data_root": "${TRAIN_DATA_ROOT}",
  "train_data_index": "${TRAIN_DATA_INDEX}",
  "train_data_files": "${TRAIN_DATA_FILES}",
  "epoch": 1000,
  "batch": 6,
  "initial_lr": 0.0002,
  "lr_reduction_factor": 0.5,
  "lr_reduction_patience": 15,
  "min_lr": 1e-6,
  "weight_decay": 0,
  "f_dropout_rate": 0.0,
  "g_dropout_rate": 0.0,
  "hidden_dim": 64,
  "num_attention_heads": 8,
  "use_layer_norm": false,
  "use_batch_norm": true,
  "use_residual": true,
  "gpu_id": 0,
  "enable_data_parallel": false,
  "use_all_train_data": true,
  "save_path": "${SAVE_PATH}"
}

Inference

The script for the inference task is in the ${REPO_ROOT}/src/deeprli/infer directory. It possesses a parameter input method similar to the above training script.

python infer.heavy.py --config "${CONFIG_FILE}"

An example of the configuration file is as follows:

{
  "model": "/path/to/trained_model.state_dict.pth",
  "model_format": "state_dict",
  "data_root": "${DATA_ROOT}",
  "data_index": "${DATA_INDEX}",
  "data_file": "${DATA_FILE}",
  "batch": 32,
  "gpu_id": 0,
  "save_path": "${SAVE_PATH}"
}

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

Haoyu Lin (developer) - hylin@pku.edu.cn
Jianfeng Pei (supervisor) - jfpei@pku.edu.cn
Luhua Lai (supervisor) - lhlai@pku.edu.cn

(back to top)

Acknowledgments

We would like to express our gratitude to all members of Luhua Lai's group for their valuable suggestions and insights. We also acknowledge the support of computing resources provided by the high-performance computing platform at the Peking-Tsinghua Center for Life Sciences, Peking University.

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
examples		examples
img		img
src		src
LICENSE.txt		LICENSE.txt
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepRLI

About The Project

Getting Started

Prerequisites

Installation

Usage

Preprocessing

Training

Inference

License

Contact

Acknowledgments

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DeepRLI

About The Project

Getting Started

Prerequisites

Installation

Usage

Preprocessing

Training

Inference

License

Contact

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages