Skip to content

ermahechap/PubMed-Sum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

PubMed-Sum

Pubmed articles summarization project for the 2020 NLP graduate course from the National University of Colombia.

By: Edwin Mahecha & Jimmmy Pulido

Notebooks

The project comprises 3 notebooks:

  1. xml_data_preprocessing.ipynb: Used to connect wit OA/OAI PubMed API. It also performs a basic XML preprocessing to remove the parts of the articles that are not relevant for text analysis such as diagrams, pictures, etc.
  2. litcovid_data_preprocessing.ipynb: Similar to the notebook above, but instead it process the text database provided by PubMed regarding the COVID-19 emergency (LitCovid).
  3. summarization.ipynb: Performs summarization using the Google T5 model available in HugginFace.

Additional Documents

We provide a paper (which is more like a technical document that summarizes the project scope) and a set of slides. Both are in spanish.

About

Pubmed articles summarization project for NLP course

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors