This repository contains datasets, model code and notebooks used for all experiments in the PROSTATA: Protein Stability Assessment using Transformers paper.
DATAcontains the datasets used for PROSTATA training and testing in the format used by the datasets authors. Also the dataset introduced in this article is available here.DATASETScontains the same datasets converted to a format used for model training.PDBcontains the PDB files downloaded during conversion.ACDC_FOLDS- converted acdc-nn train folds from here and PROSTATA test results on Ssym and Ssym_r folds from here.
00.generate_datasets.ipynb- Process theDATAdirectory and generate theDATASETSdirectory.01.add_megadataset_and_split_on_train_test_sets.ipynbExpand dataset with megadatasets data. Split on train and test sets. GeneratePROSTATA_EXPERIMENTSdirectory.02.test_models_by_folds.ipynb- Test each individual model in the ensemble using 5-fold cross validation03.test_models_on_other_datasets_ensemble.ipynb- test the PROSTATA ensemble on various combinations of train and test datasets.test_*_with_predictions.csvcontaints tests set with prediction (pred_ddg) column.04.train_final_ensemble.ipynb- train the ensemble on all data for the online tool.PROSTATA_tool.ipynb- Colab notebook for PROSTATA. Predict DDG Values for single mutation on a user sequence.
environment.yml- conda environment.PROSTATA_experiments_pearson.log- Logs of experiments run by 03.test_models_on_other_datasets_ensemble.ipynb notebook.LICENSE- Apache License 2.0Readme.MD- This file