Tired of your drugs only treating what they were designed for?
This repo includes a drug repurposing pipeline that teaches molecular models like ChemBERTa and ChemBERTa-2 (hopefully MolFormer in the future) to guess how clingy a molecule will be with your protein of choice. Trained on ChEMBL's bioactivity data, the model predicts log(IC50) values—aka how tightly a compound binds—and screens the DrugBank universe to suggest new potential meds.
Now featuring fewer lab coats and more Python.
- Train ChemBERTa or ChemBERTa-2 on ChEMBL targets with IC50 data
- Predict binding affinities (log(IC50)) on DrugBank molecules
- Rank and export top candidate drugs
- Clean CLI interface using argument parsing
- Easily extendable for other transformer-based models
python src/fetch_chembl_ic50.py --target CHEMBL1234python src/train_affinity_model.py --target CHEMBL1234 --model chembertapython src/repurpose.py --model chemberta --target CHEMBL1234 --input data/filtered_drugbank_smiles.csv.
├── data/ # Training and inference data
├── models/ # Saved trained model weights
├── notebooks/ # Notebooks to play around and test stuff
├── results/ # Saved predictions
├── src/
│ ├── models/ # Model definitions and tokenizers
│ ├── train_affinity_model.py
│ ├── repurpose.py
│ └── fetch_chembl_ic50.py