This is the official code repository for the paper: Toward Closed-loop Molecular Discovery via Language Model, Property Alignment and Strategic Search.
In our paper, we introduce:
-
Fragment-based Generative Pre-trained Transformer (FragGPT): A molecular language model designed for context-aware fragment assembly, enabling the construction of novel molecular structures from a learned vocabulary of chemical fragments.
-
Chemical Property Alignment with Direct Preference Optimization (DPO): A reinforcement learning technique to align the generative process with desirable pharmacological properties, enforcing physicochemical and synthetic feasibility to produce more drug-like candidates.
-
Target-aware Molecular Generation via Monte Carlo Tree Search (MCTS): A guided search strategy that balances the exploration of novel chemotypes and the exploitation of promising intermediates, optimizing ligand generation directly within the context of a specific protein binding pocket.
The required environmental dependencies for this project are listed in the environment.yml file. You can easily create and activate the environment using Conda:
conda env create -f environment.yml
conda activate your_env_nameA single run of the code requires less than 2000MB of VRAM. An NVIDIA RTX 3090 or a GPU with equivalent performance is sufficient.
The pre-trained weight files required for the project can be downloaded from the following link:
Click here to download the weight files
After downloading, please place the weight files in the ./weights.
For unconstrained de novo molecular generation, run the generate.py script:
python generate.pyFor conditional constrained generation tasks, navigate to the constrained_generation directory. This folder contains the relevant Python scripts and Jupyter Notebooks for you to run.
To generate molecules for specific protein targets, run the run_mcts.py script:
python run_mcts.pyYou can specify different protein targets by modifying the ligand name in the run_mcts.py file. The project currently supports the following 5 proteins, which have been validated in the paper:
parp1jak2fa75ht1bbraf
Important Note:
/utils/docking/qvina02 is the executable file for molecular docking. Before running, please ensure you grant it executable permissions:
chmod +x ./utils/docking/qvina02If you wish to generate molecules for a custom target, please follow these steps:
- Prepare your protein file in
pdbqtformat. - Open the
utils/docking/docking_utils.pyfile. - In this file, add the name of your custom protein, its pocket's central position, and the pocket size.