-
Notifications
You must be signed in to change notification settings - Fork 531
Open
Labels
Description
Hi,
At present, I have all documents as DOCX (Microsoft Word files) which I convert to PDF in order to run the GROBID XML conversion. Is there any possibility of using DOCX as input?
In case of PDF is the only input-option it is reliable that the full-text extraction is always reliable in terms 100% content integrity even if XML markup is incorrect. We are fine, in case of any incorrect XML markup, but not if there are any content loss.
punchouty and ypapanik