OpenBharatOCR

OpenBharatOCR is an open-source Python library specifically designed for optical character recognition (OCR) of Indian government documents.

Key Features

Comprehensive Document Support: Extract text from major Indian government documents including Aadhaar Card, PAN Card, Driving License, Passport, Voter ID, and more
Multi-Language OCR: Support for English and Hindi text extraction
Advanced Image Processing: Built-in preprocessing techniques for enhanced accuracy
Multiple OCR Engines: Leverages PaddleOCR, EasyOCR, and Tesseract for optimal results
Pattern Matching: Document-specific field extraction with validation

Prerequisites

Python: 3.6 or later
Operating System: Linux (Ubuntu/Debian preferred), Windows (via WSL2), or macOS
System Dependencies: Tesseract OCR (for pytesseract functionality)

Installation

Install from PyPI

pip install openbharatocr

Development Setup

Clone the repository:

git clone https://github.com/essentiasoftserv/openbharatocr.git
cd openbharatocr

Create a virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Install in development mode:

pip install -e .

Supported Documents

PAN Card

Extract information from Permanent Account Number cards.

import openbharatocr 

# Extract PAN card details
result = openbharatocr.pan(image_path)
# Returns: {'name': str, 'father_name': str, 'dob': str, 'pan_number': str}

Aadhaar Card

Process both front and back sides of Aadhaar cards.

import openbharatocr 

# Front side
front_result = openbharatocr.front_aadhaar(image_path)
# Returns: {'name': str, 'dob': str, 'gender': str, 'aadhaar_number': str}

# Back side
back_result = openbharatocr.back_aadhaar(image_path)
# Returns: {'address': str, 'aadhaar_number': str, 'pin_code': str}

Driving License

Extract details from Indian driving licenses.

import openbharatocr 

result = openbharatocr.driving_licence(image_path)
# Returns: {'name': str, 'license_number': str, 'dob': str, 'validity': str, 'address': str}

Passport

Process Indian passport information pages.

import openbharatocr 

result = openbharatocr.passport(image_path)
# Returns: {'name': str, 'passport_number': str, 'dob': str, 'doi': str, 'doe': str}

Voter ID

Extract information from both sides of Voter ID cards.

import openbharatocr 
import os

# Note: Requires YOLO model files for enhanced accuracy
# Set environment variables for YOLO model paths:
os.environ['YOLO_CFG'] = 'path/to/yolo.cfg'
os.environ['YOLO_WEIGHT'] = 'path/to/yolo.weights'

# Front side
front_result = openbharatocr.voter_id_front(image_path)
# Returns: {'name': str, 'voter_id': str, 'father_name': str, 'dob': str}

# Back side
back_result = openbharatocr.voter_id_back(image_path)
# Returns: {'address': str, 'voter_id': str}

Vehicle Registration Card/Certificate

Extract vehicle registration details.

import openbharatocr 

result = openbharatocr.vehicle_registration(image_path)
# Returns: {'registration_number': str, 'owner_name': str, 'vehicle_model': str, 
#          'registration_date': str, 'chassis_number': str, 'engine_number': str}

Water Bill

Process water utility bills.

import openbharatocr 

result = openbharatocr.water_bill(image_path)
# Returns: {'consumer_number': str, 'name': str, 'address': str, 
#          'bill_date': str, 'amount': str}

Birth Certificate

Extract information from birth certificates.

import openbharatocr 

result = openbharatocr.birth_certificate(image_path)
# Returns: {'name': str, 'dob': str, 'father_name': str, 'mother_name': str, 
#          'registration_number': str, 'registration_date': str}

Degree Certificate

Process educational degree certificates.

import openbharatocr 

result = openbharatocr.degree(image_path)
# Returns: {'name': str, 'degree': str, 'university': str, 'year': str, 'grade': str}

Bank Passbook

Extract bank passbook details (if available).

import openbharatocr 

# Note: Check if passbook functionality is exposed in the API
result = openbharatocr.passbook(image_path)  # If available
# Returns: {'account_number': str, 'name': str, 'bank_name': str, 'branch': str, 'ifsc': str}

Additional Resources

YOLO Models for Enhanced Voter ID Processing

For optimal Voter ID extraction, download the following YOLO v3 models:

Configuration File: Download YOLO Config
Weights File: Download YOLO Weights

After downloading, set the file paths in environment variables:

import os
os.environ['YOLO_CFG'] = '/path/to/yolov3.cfg'
os.environ['YOLO_WEIGHT'] = '/path/to/yolov3.weights'

Contributing

We welcome contributions to OpenBharatOCR! Whether you're fixing bugs, improving documentation, or adding new features, your help is appreciated.

Development Guidelines

Fork and Clone: Fork the repository and clone your fork locally
Create a Branch: Create a feature branch for your changes
Write Tests: Add tests for any new functionality
Follow Code Style: Use Black formatter and follow PEP 8 guidelines
Run Pre-commit Hooks: Before committing, run:
```
pre-commit run --all-files
```

Testing

Run the test suite:

pytest openbharatocr/unit_tests/

Run code quality checks:

# Format code with Black
black openbharatocr/

# Check for spelling errors
codespell

# Run all pre-commit hooks
pre-commit run --all-files

Reporting Issues

Found a bug or have a feature request? Please create an issue: https://github.com/essentiasoftserv/openbharatocr/issues

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Authors

Kunal Kumar Kushwaha - essentia.dev
Contributors - See Contributors

Acknowledgments

PaddleOCR team for the excellent OCR engine
EasyOCR project for multilingual support
Tesseract OCR community
All contributors who have helped improve this project

Name		Name	Last commit message	Last commit date
Latest commit History 155 Commits
.github/workflows		.github/workflows
openbharatocr		openbharatocr
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
buildspec.yml		buildspec.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OpenBharatOCR

Key Features

Prerequisites

Installation

Install from PyPI

Development Setup

Supported Documents

PAN Card

Aadhaar Card

Driving License

Passport

Voter ID

Vehicle Registration Card/Certificate

Water Bill

Birth Certificate

Degree Certificate

Bank Passbook

Additional Resources

YOLO Models for Enhanced Voter ID Processing

Contributing

Development Guidelines

Testing

Reporting Issues

License

Authors

Acknowledgments

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors 14

Uh oh!

Languages

License

essentiasoftserv/openbharatocr

Folders and files

Latest commit

History

Repository files navigation

OpenBharatOCR

Key Features

Prerequisites

Installation

Install from PyPI

Development Setup

Supported Documents

PAN Card

Aadhaar Card

Driving License

Passport

Voter ID

Vehicle Registration Card/Certificate

Water Bill

Birth Certificate

Degree Certificate

Bank Passbook

Additional Resources

YOLO Models for Enhanced Voter ID Processing

Contributing

Development Guidelines

Testing

Reporting Issues

License

Authors

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors 14

Uh oh!

Languages

Packages