GitHub - coozyme/ocr-engine: OCR Engine

README_CONTENT = '''# OCR Engine API

A powerful OCR (Optical Character Recognition) engine that supports multiple file formats including images, PDFs, and Excel files with embedded images.

Features

Multi-format Support: PNG, JPG, JPEG, PDF, XLSX
Multilingual OCR: English, Indonesian, Chinese (Simplified)
Smart PDF Processing: Handles both native and scanned PDFs
Excel Image Extraction: OCR for embedded images in Excel files
Spell Correction: Automatic text correction using SymSpell
Image Preprocessing: Automatic image enhancement for better OCR results
Async Processing: Background task processing with status tracking
RESTful API: FastAPI-based REST API

Installation

Clone the repository:

git clone <repository-url>
cd ocr-engine

Install dependencies:

pip install -r requirements.txt

  pip install python-docx mammoth

Install system dependencies:

# For Ubuntu/Debian
sudo apt-get update
sudo apt-get install tesseract-ocr
sudo apt-get install libmagic1
sudo apt-get install poppler-utils

# For macOS
brew install tesseract
brew install libmagic
brew install poppler

Usage

Start the server

python main.py

The API will be available at http://localhost:8000

API Documentation

Visit http://localhost:8000/docs for interactive API documentation.

API Endpoints

1. Upload File for OCR Processing

curl -X POST "http://localhost:8000/ocr/upload" \
  -H "accept: application/json" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@your_file.pdf"

Response:

{
  "task_id": "uuid-string",
  "message": "File uploaded successfully. Processing started.",
  "status_url": "/ocr/status/uuid-string",
  "result_url": "/ocr/result/uuid-string"
}

2. Check Processing Status

curl -X GET "http://localhost:8000/ocr/status/{task_id}"

Response:

{
  "id": "uuid-string",
  "status": "processing",
  "progress": 65,
  "message": "Processing PDF file..."
}

3. Get OCR Results

curl -X GET "http://localhost:8000/ocr/result/{task_id}"

Response:

{
  "id": "uuid-string",
  "status": "completed",
  "original_filename": "document.pdf",
  "file_type": "pdf",
  "processing_time": 12.34,
  "extracted_text": "Original extracted text...",
  "corrected_text": "Corrected text with spelling fixes...",
  "corrections_made": [
    {
      "original": "teh",
      "corrected": "the",
      "confidence": 1000
    }
  ],
  "detailed_results": {...},
  "created_at": "2024-01-01T10:00:00"
}

4. Delete Results (cleanup)

curl -X DELETE "http://localhost:8000/ocr/result/{task_id}"

5. Health Check

curl -X GET "http://localhost:8000/ocr/health"

File Format Support

Images (PNG, JPG, JPEG)

Direct OCR processing
Automatic image preprocessing
Coordinate extraction for text regions

PDF Files

Native text extraction for text-based PDFs
OCR processing for scanned PDFs
Automatic detection of PDF type

Excel Files (XLSX)

Text extraction from cells
OCR processing of embedded images
Sheet-by-sheet processing

Configuration

OCR Languages

Modify the language list in ocr_engine/ocr_processor.py:

self.reader = easyocr.Reader(['en', 'id', 'ch_sim'], gpu=False)

Spell Checker Dictionary

Add custom words in ocr_engine/spell_checker.py:

custom_words = ["your", "custom", "words"]
for word in custom_words:
    self.sym_spell.create_dictionary_entry(word, 1000)

Project Structure

ocr-engine/
├── main.py                 # Entry point
├── requirements.txt        # Dependencies
├── README.md              # This file
├── api/
│   ├── __init__.py
│   ├── models.py          # Pydantic models
│   └── routes.py          # FastAPI routes
├── ocr_engine/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features

Installation

Usage

Start the server

API Documentation

API Endpoints

1. Upload File for OCR Processing

2. Check Processing Status

3. Get OCR Results

4. Delete Results (cleanup)

5. Health Check

File Format Support

Images (PNG, JPG, JPEG)

PDF Files

Excel Files (XLSX)

Configuration

OCR Languages

Spell Checker Dictionary

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
images		images
api		api
engine		engine
venv		venv
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Features

Installation

Usage

Start the server

API Documentation

API Endpoints

1. Upload File for OCR Processing

2. Check Processing Status

3. Get OCR Results

4. Delete Results (cleanup)

5. Health Check

File Format Support

Images (PNG, JPG, JPEG)

PDF Files

Excel Files (XLSX)

Configuration

OCR Languages

Spell Checker Dictionary

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages