Skip to content

Conversation

@lfoppiano
Copy link
Member

When we search for a DOI in the page, the regex may truncate DOIs that are split by a breakline, so this PR proposes a simple fix that is to substitute the DOI only when the one found in the page is larger than the one extracted by the header parser

@lfoppiano lfoppiano added the bug From Hemiptera and especially its suborder Heteroptera label Jun 10, 2024
@coveralls
Copy link

Coverage Status

coverage: 40.786% (-0.001%) from 40.787%
when pulling 7f1d15d on bugfix/fix-doi-search
into 694f0ed on master.

@lfoppiano lfoppiano added this to the 0.8.2 milestone Jun 10, 2024
@lfoppiano lfoppiano modified the milestones: 0.8.2, 0.9.0 Feb 16, 2025
@coveralls
Copy link

coveralls commented Sep 23, 2025

Coverage Status

coverage: 40.396% (+0.002%) from 40.394%
when pulling 7f5b55e on bugfix/fix-doi-search
into 01fe109 on master.

@lfoppiano lfoppiano force-pushed the bugfix/fix-doi-search branch from 7f5b55e to 2487d23 Compare January 27, 2026 05:58
@coveralls
Copy link

coveralls commented Jan 27, 2026

Pull Request Test Coverage Report for Build 21495896701

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 1028 unchanged lines in 4 files lost coverage.
  • Overall coverage increased (+0.02%) to 38.23%

Files with Coverage Reduction New Missed Lines %
org/grobid/core/document/Document.java 100 72.97%
org/grobid/core/engines/HeaderParser.java 112 56.47%
org/grobid/core/utilities/TextUtilities.java 199 53.1%
org/grobid/core/data/BiblioItem.java 617 56.95%
Totals Coverage Status
Change from base Build 21386310485: 0.02%
Covered Lines: 17237
Relevant Lines: 42511

💛 - Coveralls

@lfoppiano lfoppiano modified the milestones: 0.9.0, 0.10.0 Jan 30, 2026
@lfoppiano
Copy link
Member Author

This fix, does not solves the main issue described in #1126 when the page are concatenated to the DOI. it sems naturally disappearing with #1142 after adding more training data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug From Hemiptera and especially its suborder Heteroptera

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants