🔍 DualTextOCRFusion

DualTextOCRFusion is a web-based Optical Character Recognition (OCR) application that allows users to upload images, extract text, and search for keywords within the extracted text. The app uses models like Byaldi + Qwen2-VL or General OCR Theory (GOT) for multilingual text extraction.

Features

Multilingual OCR: Extract text from images containing both Hindi and English.
Keyword Search: Search for specific keywords in the extracted text in real-time.
Text Correction and Pasting: Allows pasting images from clipboard and displaying cleaned results.
User-Friendly Interface: Simple, intuitive interface for easy image uploading and searching.

Technologies Used

Python: Backend logic.
Streamlit: For building the web interface.
Huggingface Transformers: For integrating OCR models (Qwen2-VL or GOT).
PyTorch: For deep learning inference.
OpenCV: For image preprocessing.
Pillow: For handling image data.

Project Structure

DualTextOCRFusion/
│
├── app.py                 # Main Streamlit application
├── requirements.txt       # Dependencies for the project
└── README.md              # This file

How to Run Locally

Prerequisites

Python 3.8 or above installed on your machine.

Steps

Clone the Repository:

git clone https://github.com/yourusername/dual-text-ocr-fusion.git
cd dual-text-ocr-fusion

Install Dependencies:

Install the required dependencies by running:
```
pip install -r requirements.txt
```
Run the Application:

Start the Streamlit app by running the following command:
```
streamlit run app.py
```
Open the App:

Once the server starts, the app will be available in your browser at:
```
http://localhost:8501
```

Usage

Upload an Image: Upload an image containing Hindi and/or English text in formats like JPG, JPEG, or PNG.
View Extracted Text: The app will extract and display the text from the image.
Search for Keywords: Enter any keyword to search within the extracted text in real-time without pressing Enter.

Customization

Changing the OCR Model

By default, the app uses the Qwen2-VL model, but you can switch to the General OCR Theory (GOT) model by modifying the OCR extraction logic in app.py.

Deployment

The app can be deployed using any platform that supports Streamlit apps, such as Streamlit Community Cloud, Heroku, or AWS.

Example Images

Here are some sample images you can use to test the OCR functionality:

Sample 1: A document with mixed Hindi and English text.
Sample 2: An image with only Hindi text for multilingual OCR testing.

Dependencies

The main dependencies for the project are:

transformers==4.45.0
streamlit==1.30.0
torch (from PyTorch CPU)
Pillow==10.3.0
opencv-python
streamlit-keyup
st-img-pastebutton

To see the full list of dependencies, refer to the requirements.txt file.

Contributing

If you'd like to contribute to this project, feel free to fork the repository and submit a pull request. Follow these steps:

Fork the project.
Create a feature branch:
```
git checkout -b feature-branch
```
Commit your changes:
```
git commit -am 'Add new feature'
```
Push to the branch:
```
git push origin feature-branch
```
Open a pull request.

Screenshot

License

This project is licensed under the MIT License - see the LICENSE file for details.

Credits

Streamlit: For the easy-to-use web interface.
Huggingface Transformers: For the powerful OCR models.
PyTorch: For model inference.
Byaldi & GOT Models: For multilingual OCR support.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
images		images
ocr_env		ocr_env
.gitignore		.gitignore
README.md		README.md
Screenshot1.png		Screenshot1.png
Screenshot2.png		Screenshot2.png
app.py		app.py
icon.jpeg		icon.jpeg
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔍 DualTextOCRFusion

Features

Technologies Used

Project Structure

How to Run Locally

Prerequisites

Steps

Usage

Customization

Changing the OCR Model

Deployment

Example Images

Dependencies

Contributing

Screenshot

License

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Languages

UniquePratham/DualTextOCRFusion

Folders and files

Latest commit

History

Repository files navigation

🔍 DualTextOCRFusion

Features

Technologies Used

Project Structure

How to Run Locally

Prerequisites

Steps

Usage

Customization

Changing the OCR Model

Deployment

Example Images

Dependencies

Contributing

Screenshot

License

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages