This repository contains code for predicting the aqueous solubility of organic molecules using machine learning models. The models and dataset are based on the research paper: Predicting Aqueous Solubility of Organic Molecules Using Deep Learning Models with Varied Molecular Representations.
-
Download Data: Download the dataset from this link and save it as
data.csvin the./datafolder. -
Generate Features:
- Generate Pybel coordinates and Molecular Dynamics (MDM) features by running
create_data.pyin the./datafolder:cd ./data python create_data.py
- Generate Pybel coordinates and Molecular Dynamics (MDM) features by running
-
Train Models:
- To train the MDM model, run
train.pyin the./mdmfolder:cd ../mdm python train.py - To train the GNN model, run
train.pyin the./gnnfolder:cd ../gnn python train.py - To train the SMI model, run
train.pyin the./smifolder:cd ../smi python train.py
- To train the MDM model, run
-
Make Predictions:
- Use the
predict.ipynbfiles in each model's folder to make predictions:Repeat the above steps for thecd ../mdm jupyter notebook predict.ipynbgnnandsmifolders.
- Use the
-
Ensemble Models:
- To ensemble the models, run the following scripts:
cd ../ensemble python CV.py python Optuna.py python KNN.py
- To ensemble the models, run the following scripts:
-
Compare Predictions:
- To compare predictions from individual models with ensemble methods, use the
ensemble_prediction.ipynbnotebook:jupyter notebook ensemble_prediction.ipynb
- To compare predictions from individual models with ensemble methods, use the
For detailed instructions on how to run the models, featurize the data, and other specifics, please refer to the original research paper linked above. The methods and techniques described in the paper are critical for understanding and effectively using this repository.
