Toward Closed-loop Molecular Discovery via Language Model, Property Alignment and Strategic Search

This is the official code repository for the paper: Toward Closed-loop Molecular Discovery via Language Model, Property Alignment and Strategic Search.

In our paper, we introduce:

Fragment-based Generative Pre-trained Transformer (FragGPT): A molecular language model designed for context-aware fragment assembly, enabling the construction of novel molecular structures from a learned vocabulary of chemical fragments.
Chemical Property Alignment with Direct Preference Optimization (DPO): A reinforcement learning technique to align the generative process with desirable pharmacological properties, enforcing physicochemical and synthetic feasibility to produce more drug-like candidates.
Target-aware Molecular Generation via Monte Carlo Tree Search (MCTS): A guided search strategy that balances the exploration of novel chemotypes and the exploitation of promising intermediates, optimizing ligand generation directly within the context of a specific protein binding pocket.

Installation

The required environmental dependencies for this project are listed in the environment.yml file. You can easily create and activate the environment using Conda:

conda env create -f environment.yml
conda activate your_env_name

Hardware Requirements

A single run of the code requires less than 2000MB of VRAM. An NVIDIA RTX 3090 or a GPU with equivalent performance is sufficient.

Pre-trained Weights

The pre-trained weight files required for the project can be downloaded from the following link:

Click here to download the weight files

After downloading, please place the weight files in the ./weights.

Usage

1. De Novo Generation

For unconstrained de novo molecular generation, run the generate.py script:

python generate.py

2. Constrained Generation

For conditional constrained generation tasks, navigate to the constrained_generation directory. This folder contains the relevant Python scripts and Jupyter Notebooks for you to run.

3. Target-based Generation

To generate molecules for specific protein targets, run the run_mcts.py script:

python run_mcts.py

You can specify different protein targets by modifying the ligand name in the run_mcts.py file. The project currently supports the following 5 proteins, which have been validated in the paper:

parp1
jak2
fa7
5ht1b
braf

Important Note:

/utils/docking/qvina02 is the executable file for molecular docking. Before running, please ensure you grant it executable permissions:

chmod +x ./utils/docking/qvina02

4. Custom Target Generation

If you wish to generate molecules for a custom target, please follow these steps:

Prepare your protein file in pdbqt format.
Open the utils/docking/docking_utils.py file.
In this file, add the name of your custom protein, its pocket's central position, and the pocket size.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
constrained_generation		constrained_generation
image		image
utils		utils
vocabs		vocabs
dataset.py		dataset.py
environment.yml		environment.yml
fpscores.pkl.gz		fpscores.pkl.gz
fragment_utils.py		fragment_utils.py
generate.py		generate.py
mcts.py		mcts.py
model.py		model.py
readme.md		readme.md
run_mcts.py		run_mcts.py
sascorer.py		sascorer.py
tokenizer.py		tokenizer.py
train.py		train.py
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Toward Closed-loop Molecular Discovery via Language Model, Property Alignment and Strategic Search

Installation

Hardware Requirements

Pre-trained Weights

Usage

1. De Novo Generation

2. Constrained Generation

3. Target-based Generation

4. Custom Target Generation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Toward Closed-loop Molecular Discovery via Language Model, Property Alignment and Strategic Search

Installation

Hardware Requirements

Pre-trained Weights

Usage

1. De Novo Generation

2. Constrained Generation

3. Target-based Generation

4. Custom Target Generation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages