OpenBharatOCR is an open-source Python library specifically designed for optical character recognition (OCR) of Indian government documents.
- Comprehensive Document Support: Extract text from major Indian government documents including Aadhaar Card, PAN Card, Driving License, Passport, Voter ID, and more
- Multi-Language OCR: Support for English and Hindi text extraction
- Advanced Image Processing: Built-in preprocessing techniques for enhanced accuracy
- Multiple OCR Engines: Leverages PaddleOCR, EasyOCR, and Tesseract for optimal results
- Pattern Matching: Document-specific field extraction with validation
- Python: 3.6 or later
- Operating System: Linux (Ubuntu/Debian preferred), Windows (via WSL2), or macOS
- System Dependencies: Tesseract OCR (for pytesseract functionality)
pip install openbharatocr- Clone the repository:
git clone https://github.com/essentiasoftserv/openbharatocr.git
cd openbharatocr- Create a virtual environment:
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Install in development mode:
pip install -e .Extract information from Permanent Account Number cards.
import openbharatocr
# Extract PAN card details
result = openbharatocr.pan(image_path)
# Returns: {'name': str, 'father_name': str, 'dob': str, 'pan_number': str}Process both front and back sides of Aadhaar cards.
import openbharatocr
# Front side
front_result = openbharatocr.front_aadhaar(image_path)
# Returns: {'name': str, 'dob': str, 'gender': str, 'aadhaar_number': str}
# Back side
back_result = openbharatocr.back_aadhaar(image_path)
# Returns: {'address': str, 'aadhaar_number': str, 'pin_code': str}Extract details from Indian driving licenses.
import openbharatocr
result = openbharatocr.driving_licence(image_path)
# Returns: {'name': str, 'license_number': str, 'dob': str, 'validity': str, 'address': str}Process Indian passport information pages.
import openbharatocr
result = openbharatocr.passport(image_path)
# Returns: {'name': str, 'passport_number': str, 'dob': str, 'doi': str, 'doe': str}Extract information from both sides of Voter ID cards.
import openbharatocr
import os
# Note: Requires YOLO model files for enhanced accuracy
# Set environment variables for YOLO model paths:
os.environ['YOLO_CFG'] = 'path/to/yolo.cfg'
os.environ['YOLO_WEIGHT'] = 'path/to/yolo.weights'
# Front side
front_result = openbharatocr.voter_id_front(image_path)
# Returns: {'name': str, 'voter_id': str, 'father_name': str, 'dob': str}
# Back side
back_result = openbharatocr.voter_id_back(image_path)
# Returns: {'address': str, 'voter_id': str}Extract vehicle registration details.
import openbharatocr
result = openbharatocr.vehicle_registration(image_path)
# Returns: {'registration_number': str, 'owner_name': str, 'vehicle_model': str,
# 'registration_date': str, 'chassis_number': str, 'engine_number': str}Process water utility bills.
import openbharatocr
result = openbharatocr.water_bill(image_path)
# Returns: {'consumer_number': str, 'name': str, 'address': str,
# 'bill_date': str, 'amount': str}Extract information from birth certificates.
import openbharatocr
result = openbharatocr.birth_certificate(image_path)
# Returns: {'name': str, 'dob': str, 'father_name': str, 'mother_name': str,
# 'registration_number': str, 'registration_date': str}Process educational degree certificates.
import openbharatocr
result = openbharatocr.degree(image_path)
# Returns: {'name': str, 'degree': str, 'university': str, 'year': str, 'grade': str}Extract bank passbook details (if available).
import openbharatocr
# Note: Check if passbook functionality is exposed in the API
result = openbharatocr.passbook(image_path) # If available
# Returns: {'account_number': str, 'name': str, 'bank_name': str, 'branch': str, 'ifsc': str}For optimal Voter ID extraction, download the following YOLO v3 models:
- Configuration File: Download YOLO Config
- Weights File: Download YOLO Weights
After downloading, set the file paths in environment variables:
import os
os.environ['YOLO_CFG'] = '/path/to/yolov3.cfg'
os.environ['YOLO_WEIGHT'] = '/path/to/yolov3.weights'We welcome contributions to OpenBharatOCR! Whether you're fixing bugs, improving documentation, or adding new features, your help is appreciated.
- Fork and Clone: Fork the repository and clone your fork locally
- Create a Branch: Create a feature branch for your changes
- Write Tests: Add tests for any new functionality
- Follow Code Style: Use Black formatter and follow PEP 8 guidelines
- Run Pre-commit Hooks: Before committing, run:
pre-commit run --all-files
Run the test suite:
pytest openbharatocr/unit_tests/Run code quality checks:
# Format code with Black
black openbharatocr/
# Check for spelling errors
codespell
# Run all pre-commit hooks
pre-commit run --all-filesFound a bug or have a feature request? Please create an issue: https://github.com/essentiasoftserv/openbharatocr/issues
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- Kunal Kumar Kushwaha - essentia.dev
- Contributors - See Contributors
- PaddleOCR team for the excellent OCR engine
- EasyOCR project for multilingual support
- Tesseract OCR community
- All contributors who have helped improve this project