Skip to content

"Pages" in title_j #15

@cverluise

Description

@cverluise

Around 0.8% of the NPL publication in the beta dataset have "Pages" as title_j.

How to reproduce the behaviour

Details
SELECT
  *
FROM (
  SELECT
    *
  FROM
    `npl-parsing.patcit.beta`
  WHERE
    title_j ="Pages"
    ) 
    AS parsing
JOIN (
  SELECT
    npl_publn_id AS id,
    npl_biblio
  FROM
    `usptobias.patstat.tls214`) AS tls214
ON
  tls214.id=parsing.npl_publn_id

Ideas Solution

The issue seems to be closely related to the one described in #14

There seems to be a common pattern in these citations in the sense that they are already well structured (e.g ENTNEHEMEN UND PRUEFEN MIT EINEM SCHNELLEN HANDHABUNGSGERAET', KUNSTSTOFFE,DE,CARL HANSER VERLAG. MUNCHEN, vol. 80, no. 8, 1 August 1990 (1990-08-01), pages 894, XP000150775, ISSN: 0023-5563).

As for #14 , training the Grobid model on these examples seems to be the best option. Then, examples affected by this issue will be processed again.

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions