Skip to content

Standardise and/or propagate title_abbrev_j #9

@cverluise

Description

@cverluise

Motivation

  1. There is a 1:n relation between title_j and title_abbrev_j. E.g.
title_j count_distinct title_abbrev_j
Inflammation Research 3 Inflamm. res.,Inflamm. Res.,Inflamm Res
Biological Cybernetics 3 Biological Cybernetics,Biol. Cybern.,Biol. Cybernetics
Journal of Materials Science 3 J Mater Sci,JOURNAL OF MATERIALS SCIENCE,Journal of Materials Science
  1. About half of the npl_publn with a title_j have no title_abbrev_j.

How to reproduce the behavior

  1. See below
Details
SELECT
  title_j,
  count(distinct(title_abbrev_j)) as count_distinct,
  STRING_AGG(distinct(title_abbrev_j)) as title_abbrev_j
FROM
  `npl-parsing.patcit.beta`
WHERE
  title_j is not null
GROUP BY 
  title_j 
ORDER BY
  count_distinct
  DESC
  1. See below
Details
SELECT
  count(distinct(npl_publn_id)) as count_distinct
FROM
  `npl-parsing.patcit.beta`
WHERE
  title_j is not null and title_abbrev_j is NULL # comment and title_abbrev_j is NULL for denom

Feature request

  1. Decide whether we should keep the title_abbrev_j. Note that, at least in the beta, there is no npl_publn with null title_j but with non null title_abbrev_j. In a sense, title_abbrev_j does not add any specific/new information

  2. Following 0., the priority seems to bee to standardise title_j. From that, we can populate the title_abbrev_j disregarding it parsed value.

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions