drug-NLP

Description

An exeprienced medicinal chemist could have a fair guess of binder structure for a given disease target, which could serve as a start point of a drug discovery compaign. This project aims to train an LLM to enhance their ability in this, in few/zero-shot manner.

The input is a description of the disease and the associated target protein. The expected output is a SMILES string of the initial guess. We evaluate the success by measuring the similarity between the proposed drug and a list of known drugs that match the input description.

Setup

Initialize submodules

git submodule update --init --recursive

Python 3.11+ should work, with the standard scientific computing libraries installed.

Run

Creating the initial drug dataset

Use scrape.ipynb to pull from FDA and get active ingredients + SMILES strings from DrugBank.

Then use final_scarpe.py to get the disease and other drug info and store it.

Creating the fine tuning dataset

From the drug info, use generate_finetune.py in order to generate the prompt for each drug.

Running Baseline

bash scripts/run_baseline.sh

Run graph GA

After fine-tuning and generating the fine-tuned outputs of our model, we can use graph GA to evolve it.

In /mol_opt/: python run_drug_nlp.py ../final_finetuning/results/10_production_experiment.csv Or in base directory, run ./scripts/run_ga.sh

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
drug_nlp		drug_nlp
eval		eval
final_finetuning		final_finetuning
mol_opt @ 9998fb0		mol_opt @ 9998fb0
oracle		oracle
scraper		scraper
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
inspect_dataset.ipynb		inspect_dataset.ipynb
inspect_dataset.py		inspect_dataset.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

drug-NLP

Description

Setup

Initialize submodules

Run

Creating the initial drug dataset

Creating the fine tuning dataset

Running Baseline

Run graph GA

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

ronsh9/drug-NLP

Folders and files

Latest commit

History

Repository files navigation

drug-NLP

Description

Setup

Initialize submodules

Run

Creating the initial drug dataset

Creating the fine tuning dataset

Running Baseline

Run graph GA

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages