Various ways to do this. 1. replicate Rod Page's heuristics 2. get BHL's OCR for each page and use it as a probe to the CEC or Hindawi OCR