@reynoldsm88
In the pdf for document 25ac6fa139dc49c98194c0fd80dbe900 (and elsewhere), there's these end-of-line dashes:

such as the ones between "above" and "average", and "surplus" and "producing". Normally, this indicates a split word, so the converter is reasonably combining them into single tokens "aboveaverage", "surplusproducing"
If possible, though, we could use a simple algorithm that asks if the two words combined form a real word, if they don't, as in the above cases, don't fully combine the words, just keep the dash between them.
pdf source:
http://fews.net/sites/default/files/documents/reports/MONTHLY%20PRICE%20WATCH_AND_ANNEX_NOVEMBER2014_0_1.pdf