Skip to content

Unclear documentation in Parse MEDLINE XML - delete = True or False? #165

@callebalik

Description

@callebalik

Version 0.5.1
Documentation on Parse MEDLINE XML in README differs a bit from the medline_parser script.

Readme: delete : boolean if False means paper got updated so you might have two
Script: An iterator of dictionary containing information about articles in NLM format.
see parse_article_info). Articles that have been deleted will be
added with no information other than the field delete being True

I'm somewhat confused. As one seems to indicate that delete = False -> paper updated
While delete = True -> paper deleted.
But these don't seem like natural opposites. Doesn't updated mean that the previous paper was deleted?

Readme for reference:
MEDLINE XML has a different XML format than PubMed Open Access. The structure of XML files can be found in MEDLINE/PubMed DTD [here](https://www.nlm.nih.gov/databases/dtd/). You can use the function parse_medline_xml` to parse that format. This function will return list of dictionaries, where each element contains:

  • pmid : PubMed ID
  • pmc : PubMed Central ID
  • doi : DOI
  • other_id : Other IDs found, each separated by ;
  • title : title of the article
  • abstract : abstract of the article
  • authors : authors, each separated by ;
  • mesh_terms : list of MeSH terms with corresponding MeSH ID, each separated by ; e.g. 'D000161:Acoustic Stimulation; D000328:Adult; ...
  • publication_types : list of publication type list each separated by ; e.g. 'D016428:Journal Article'
  • keywords : list of keywords, each separated by ;
  • chemical_list : list of chemical terms, each separated by ;
  • pubdate : Publication date. Defaults to year information only.
  • journal : journal of the given paper
  • medline_ta : this is abbreviation of the journal name
  • nlm_unique_id : NLM unique identification
  • issn_linking : ISSN linkage, typically use to link with Web of Science dataset
  • country : Country extracted from journal information field
  • reference : string of PMID each separated by ; or list of references made to the article
  • delete : boolean if False means paper got updated so you might have two
  • languages : list of languages, separated by ;
  • vernacular_title: vernacular title. Defaults to empty string whenever non-available.

XMLs for the same paper. You can delete the record of deleted paper because it got updated.`

Greatful for clarification as I've hade some duplication issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions