PDF extraction introducing stray double carriage returns of unknown cause

@reynoldsm88

Any double carriage return is going to introduce a sentence break during information extraction. So any time a double carriage return in is in the middle of a sentence, that's quite destructive. Sometimes it's obvious what's causing them, but I see them in random places sometimes. For instance in the PDF of document 1f5db65f2b3b158f8b3f0ae53f7c508c

![image](https://user-images.githubusercontent.com/38891375/63977110-e352f480-ca80-11e9-98e5-81dc78a617b3.png)

The converter is introducing a double carriage return between "and" and "Nutrition Teams". Other line breaks in this bullet points and other similar bullet points do not typically cause double carriage returns. Although there are other stray ones such as after "WFP staff is working alongside NDRMC staff in" in the same document. 

pdf source:

https://documents.wfp.org/stellent/groups/Public/documents/ep/WFP284788.pdf?_ga=2.243684229.1030149860.1553624300-1022356052.1547047485

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PDF extraction introducing stray double carriage returns of unknown cause #8

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PDF extraction introducing stray double carriage returns of unknown cause #8

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions