1.) Pytesseract
2.) Matplotlib
3.) Numpy
4.) openCV
- Executable Python script that is highly resuable - just add a for-loop to itterate the execution of tesseract on up to as many images as want.
- Multi-layered filtering- just execute the ocr-python.py script to run multiple filters through images to imporve accuracy of tesseract OCR recognition
Basic Function Filter:
Images Adjusting Filter:
- Uses matplotlib to allow you to visualize how your filtering functions are actually manipulating images- display is based on a timer that you can control
- Display the hOCR data by uncommenting this command for each filter
The hOCR data output will look like this:

- Enhance image quality
- Automate rotattion of image
- Deskewing / Border Removal
- Cancel noise
- Logical operation to output the text with the highest confidence
- Incorporate data training and testing
Next Step Flow Chart:





