An automated Document Scanner that uses OpenCV and Tesseract OCR to transform photos of documents into clean, perspective-corrected scans with extracted text.
Table of Contents (click to expand)
This Document Scanner combines OpenCV for perspective transformation and pytesseract (Tesseract OCR) for text extraction to produce crisp, scanner-quality images from ordinary photos. It can:
- Detect document edges and correct perspective.
- Enhance readability using thresholding and other filters.
- Extract text using OCR, making the scanned copy easily searchable and editable.
- Automatic Edge Detection – Finds the boundary of your document with Canny + contours.
- Perspective Correction – Warps images to produce a flat, top-down view.
- OCR Integration – Extracts text from scanned images for quick editing.
- Modular Codebase – Separate files for scanning logic, OCR logic, and utility functions.
- Cross-Platform Setup – Works on Windows, macOS, and Linux with minimal changes.
| Tool / Library | Purpose |
|---|---|
| Python | Core language |
| OpenCV | Image processing & computer vision |
| pytesseract | Optical Character Recognition (OCR) |
| NumPy | Numerical computations & matrix ops |
| Tkinter | Simplified GUI for file browsing |
git clone https://github.com/uzumstanley/Document-Scanner.git
cd Document-Scannerpip install -r requirements.txtNote: Make sure you have Python 3.7+ installed.
- Windows: Download Installer. After installing, add the Tesseract installation folder to your system
PATH. - macOS:
brew install tesseract
- Linux:
sudo apt-get install tesseract-ocr
Then, specify the full path to tesseract.exe in ocr.py (Windows) or ensure it’s in your PATH (macOS/Linux).
python main.py- A file dialog will appear. Select the document image you want to scan.
- Once done, the scanned image appears on the screen, and the extracted text is printed in the console.
Document-Scanner
├── main.py # Starts the GUI, displays scanned output, and prints OCR results
├── requirements.txt # Project dependencies
├── scanner.py # Logic for perspective correction & image preprocessing
├── ocr.py # Text extraction using Tesseract
├── utils.py # Utility functions (order_points, four_point_transform, etc.)
├── README.md # Project documentation
└── __pycache__/ # Compiled Python files (ignored by Git)
- Image Preprocessing: Applies Gaussian blur and Canny edge detection to isolate the document boundary.
- Contour Detection: Finds contours, identifies the largest 4-point polygon (likely the document).
- Perspective Transformation: Uses the four detected points to warp the image, creating a flat, rectangular “scan.”
- Binarization: Thresholds the warped image to improve contrast and readability for OCR.
- Text Extraction: Converts the final image to text via Tesseract OCR (
pytesseract).
- GUI Enhancements: Adding progress bars, adjustable filters, and error handling pop-ups.
- Automatic Cropping: Detecting and removing any excess borders after warping.
- Batch Processing: Scanning multiple images/folders in a single run.
- PDF Generation: Exporting scanned images + extracted text as PDF for easy sharing.
Contributions and suggestions are welcome! Check out the Issues tab or open a Pull Request to propose changes.
- Fork the repo on GitHub.
- Clone your fork locally.
- Create a new branch for your feature/bugfix.
- Commit and push your changes.
- Submit a Pull Request.
This project is licensed under the MIT License. You are free to use, share, and adapt it, provided you give appropriate credit.
Thanks for checking out the Document Scanner! If you find it helpful, please star this repository and consider sharing it with others.
If you have any questions or suggestions, feel free to open an issue or contact me directly. I hope this project helps you with your document processing tasks!
