Document Scanner

An automated Document Scanner that uses OpenCV and Tesseract OCR to transform photos of documents into clean, perspective-corrected scans with extracted text.

Table of Contents (click to expand)

Overview
Key Features
Tech Stack
Installation & Usage
Project Structure
How It Works
Planned Improvements
Contributing
License

Overview

This Document Scanner combines OpenCV for perspective transformation and pytesseract (Tesseract OCR) for text extraction to produce crisp, scanner-quality images from ordinary photos. It can:

Detect document edges and correct perspective.
Enhance readability using thresholding and other filters.
Extract text using OCR, making the scanned copy easily searchable and editable.

Key Features

Automatic Edge Detection – Finds the boundary of your document with Canny + contours.
Perspective Correction – Warps images to produce a flat, top-down view.
OCR Integration – Extracts text from scanned images for quick editing.
Modular Codebase – Separate files for scanning logic, OCR logic, and utility functions.
Cross-Platform Setup – Works on Windows, macOS, and Linux with minimal changes.

Tech Stack

Tool / Library	Purpose
Python	Core language
OpenCV	Image processing & computer vision
pytesseract	Optical Character Recognition (OCR)
NumPy	Numerical computations & matrix ops
Tkinter	Simplified GUI for file browsing

Installation & Usage

1. Clone the Repository

git clone https://github.com/uzumstanley/Document-Scanner.git
cd Document-Scanner

2. Install Dependencies

pip install -r requirements.txt

Note: Make sure you have Python 3.7+ installed.

3. Install and Configure Tesseract OCR

Windows: Download Installer. After installing, add the Tesseract installation folder to your system PATH.
macOS:
```
brew install tesseract
```
Linux:
```
sudo apt-get install tesseract-ocr
```

Then, specify the full path to tesseract.exe in ocr.py (Windows) or ensure it’s in your PATH (macOS/Linux).

4. Run the Application

python main.py

A file dialog will appear. Select the document image you want to scan.
Once done, the scanned image appears on the screen, and the extracted text is printed in the console.

Project Structure

Document-Scanner
├── main.py             # Starts the GUI, displays scanned output, and prints OCR results
├── requirements.txt    # Project dependencies
├── scanner.py          # Logic for perspective correction & image preprocessing
├── ocr.py              # Text extraction using Tesseract
├── utils.py            # Utility functions (order_points, four_point_transform, etc.)
├── README.md           # Project documentation
└── __pycache__/        # Compiled Python files (ignored by Git)

How It Works

Image Preprocessing: Applies Gaussian blur and Canny edge detection to isolate the document boundary.
Contour Detection: Finds contours, identifies the largest 4-point polygon (likely the document).
Perspective Transformation: Uses the four detected points to warp the image, creating a flat, rectangular “scan.”
Binarization: Thresholds the warped image to improve contrast and readability for OCR.
Text Extraction: Converts the final image to text via Tesseract OCR (pytesseract).

Planned Improvements

GUI Enhancements: Adding progress bars, adjustable filters, and error handling pop-ups.
Automatic Cropping: Detecting and removing any excess borders after warping.
Batch Processing: Scanning multiple images/folders in a single run.
PDF Generation: Exporting scanned images + extracted text as PDF for easy sharing.

Contributing

Contributions and suggestions are welcome! Check out the Issues tab or open a Pull Request to propose changes.

Fork the repo on GitHub.
Clone your fork locally.
Create a new branch for your feature/bugfix.
Commit and push your changes.
Submit a Pull Request.

License

This project is licensed under the MIT License. You are free to use, share, and adapt it, provided you give appropriate credit.

Thanks for checking out the Document Scanner! If you find it helpful, please star this repository and consider sharing it with others.

If you have any questions or suggestions, feel free to open an issue or contact me directly. I hope this project helps you with your document processing tasks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Document Scanner

Overview

Key Features

Tech Stack

Installation & Usage

1. Clone the Repository

2. Install Dependencies

3. Install and Configure Tesseract OCR

4. Run the Application

Project Structure

How It Works

Planned Improvements

Contributing

License

Thanks for checking out the Document Scanner! If you find it helpful, please star this repository and consider sharing it with others.

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
__pycache__		__pycache__
README.md		README.md
main.py		main.py
ocr.py		ocr.py
requirements.txt		requirements.txt
scanner.py		scanner.py
utils.py		utils.py

uzumstanley/Document-Scanner

Folders and files

Latest commit

History

Repository files navigation

Document Scanner

Overview

Key Features

Tech Stack

Installation & Usage

1. Clone the Repository

2. Install Dependencies

3. Install and Configure Tesseract OCR

4. Run the Application

Project Structure

How It Works

Planned Improvements

Contributing

License

Thanks for checking out the Document Scanner! If you find it helpful, please star this repository and consider sharing it with others.

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages