Skip to content

breogann/Fighting-COVID-19-through-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fighting COVID-19

Prediction of coronavirus-binding pharmaceutical molecules using machine learning 💊 | 🦠

This Hackathon (https://www.vencealvirus.com) is an initiative from the Spanish government to create proposals in order to help with the coronavirus crisis

👨🏻‍💻participants: ireneisdoomed,mdemaic,breogann.

Introduction 📖

The main motivation behind this work is the use of AI algorithms to fight the lack of effective treatments for the disease created by SARS-CoV-2.

The whole scientific community is aiming to develop different strategies to stop the pandemic: vaccine development, synthetization of new molcules and also the use of the existint ones. The latter—adaptation of already comercialized molecules, is the way of action we worked on, since it's the best time-effective alternative.

A big problem in the pharma industry is to know whether a specific molecule can bind to a protein. Drug molecules are designed in such a way that its union to a specific viral protein change its structure, therefore leaving them inactive.

proteins

AI has a lot to offer in this field since the regular process of drug-screening is enduring and costful. Using an already trained neural network, we predicted the binding degree of more than 80 anti-viral drugs to the main proteins of the virus.

Under this paradigm, drug synthetization is based on four criteria:

  • Safe drugs in humans
  • Active against COVID-19
  • Feasibility of production
  • Easily distributed

Data 📊

There are three main types of data we used:

  • The virus proteins: proteins are chains of aminoacids. These aminoacids can be represented using the FASTA format: a string where every letter is a different aminoacid.
>pdb|6YB7|A Chain A, SARS-CoV-2 main protease
SGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDVVYCPRHVICTSEDMLNPNYEDLLIRKSNHNFLVQA
GNVQLRVIGHSMQNCVLKLKVDTANPKTPKYKFVRIQPGQTFSVLACYNGSPSGVYQCAMRPNFTIKGSF
LNGSCGSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGNFYGPFVDRQTAQAAGTDTTITVNVLAWLYA
AVINGDRWFLNRFTTTLNDFNLVAMKYNYEPLTQDHVDILGPLSAQTGIAVLDMCASLKELLQNGMNGRT
ILGSALLEDEFTPFDVVRQCSGVTFQ
  • The list of anti-viral drugs: Drugs can have different structures and compositions, so the notation is different. They are usually represented using the SMILES notation:
CC(C)CN(CC(C(CC1=CC=CC=C1)NC(=O)OC2COC3C2CCO3)O)S(=O)(=O)C4=CC=C(C=C4)N
  • The degree of binding affinity of the two previous data points: which depends on key pharmakinetic aspects of the molecules and is given by the neural network.

Data processing 🛠

We created a dataset with all of these data so that we could work with it and get more information.

  • For the FASTA's, we used web scrapping and regex to obtain the 55 sub-molecules that form all of the virus' proteins.
  • For the list of drugs, we web-scrapped the drugbanked.ca site doing an anti-virals search.
  • For the degree of binding affinity of these molecules, we used Selenium to automatize the execution of the more than 4.000 entries through the mt-dti.deargendev.me/dti site.
Used technologies 🔌 Used libraries 📚
  • Selenium
  • BeautifulSoup
  • Regex
  • Requests
  • PubChemPy
To do:
  • Continue using Selenium to complete de dataset (possibly reducing the processing time)
  • Reserch other potential drugs

About

Hackathon #VenceAlVirus

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors