Braille OCR

2026-03-10.09-54-39.mp4

A deep-learning pipeline + webapp that converts images of Braille text into readable print text. The system supports automatic language detection and transcription for English (UEB) and Cantonese Braille.

Features

Optical Braille Recognition: Detects Braille cells in images and converts them to Unicode Braille patterns.
Multi-Language Support:
- English (Unified English Braille Grade 1 and 2)
- Cantonese (Hong Kong)
Auto-Classification: Automatically determines the Braille language using a lightweight neural network.
Fast Inference: Optimized for consumer CPUs, achieving sub-second inference times.
Visual Feedback: Returns the original image with bounding boxes drawn around detected Braille cells.
Responsive UI: Mobile-friendly interface for capturing photos directly from devices.

Architecture

The project is containerized using Docker and consists of the following services:

Frontend: React application built with TypeScript and Shadcn. Served via Nginx.
Backend API: FastAPI service that handles image uploads, input validation, and job queuing.
Worker: Celery worker that executes the ML pipeline (OCR, classification, back-translation).
Redis: Message broker and result backend for the task queue.

Tech Stack

ML/AI: PyTorch, Ultralytics (YOLO), OpenCV
Backend: Python, FastAPI, Celery, Redis
Frontend: React, Shadcn UI, Tailwind CSS
Infrastructure: Docker Compose

Machine Learning Implementation

Optical Braille Recognition (Detection)

The object detection component references the methodology described in:

Ovodov, Ilya G. "Optical Braille Recognition Using Object Detection Neural Network." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.

Dataset and Training: The model was trained using the Angelina Dataset (created by Ovodov) and the DSBI Dataset. To address extreme class imbalance observed in the Angelina dataset, where some cells had significantly fewer instances than others, data augmentation techniques were applied, including image flipping and Braille cell class reassignment, as described in the paper.

Model Selection: While the reference paper utilized a RetinaNet architecture, this project implements a YOLO (You Only Look Once) model. This architectural change significantly improved performance, enabling sub-second inference times on consumer CPUs compared to the 10+ seconds often required by RetinaNet on non-CUDA hardware, while maintaining comparable accuracy.

Language Classification

Language classification is handled by a custom 1D Convolutional Neural Network (CNN).

Input: The first 20 characters of the transcribed Braille Unicode string.
Training: Trained on a synthetic dataset generated by programmatically transcribing conventional text datasets into Braille using liblouis.
Performance: The model is extremely lightweight and efficient for distinguishing between English and Cantonese patterns.

Back-translation Logic

Once Braille cells are detected, they are translated into print text:

English: Uses liblouis, an open-source braille translator, to handle UEB Grade 1 and 2 contractions.
Cantonese: Uses a custom pipeline.
- Cantonese Braille relies on phonetic pronunciation rather than character mapping, leading to homophone ambiguities that standard translators like liblouis cannot resolve contextually.
- This system first converts Braille to Jyutping (Romanization).
- The Jyutping is then processed by jyutping2characters, a library developed for this project that maps phonetic sequences to likely Chinese characters using frequency analysis and mapping tables.

Getting Started

Prerequisites

Docker
Model weights (OCR and Classifier) which will be released in the near future

Installation

Clone the repository:

git clone https://github.com/endernoke/braille-ocr.git
cd braille-ocr

Place the trained model weights:
- backend/models/ocr_model.pt
- backend/models/classifier_model.pt
Note: Training code and weights are not currently in the repo. Please contact the author for access.

Start the application (GPU mode):

docker compose up --build

Or for CPU-only environments:

docker compose -f docker-compose.yml -f docker-compose.cpu.yml up --build

Access the application:
- Frontend: http://localhost:3000
- API Documentation: http://localhost:8000/docs

How it Works

Upload: User uploads an image via the frontend.
Queue: The API validates the image and pushes a task to Redis.
Processing (Worker):
- Pre-processing: Image normalization.
- Inference: YOLO model detects Braille cells and coordinates.
- Post-processing: Detected boxes are sorted into lines and converted to Braille Unicode.
- Classification: If no language is specified, the 1D CNN predicts the language.
- Translation: The specific back-translation logic (Liblouis or Custom Cantonese) converts Braille to text.
Result: The worker saves an annotated image and returns the text and confidence scores to the frontend.

Acknowledgements

Ilya G. Ovodov for the research on Optical Braille Recognition using object detection.
Liblouis project for the open-source braille translator.
Ultralytics for the YOLO implementation.
Angelina, DSBI, MartinThoma/wili_2018, jed351/Traditional-Chinese-Common-Crawl-Filtered, and HuggingFaceFW/fineweb dataset creators for the training data.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
backend		backend
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
QUICKSTART.md		QUICKSTART.md
README.md		README.md
docker-compose.cpu.yml		docker-compose.cpu.yml
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Braille OCR

Features

Architecture

Tech Stack

Machine Learning Implementation

Optical Braille Recognition (Detection)

Language Classification

Back-translation Logic

Getting Started

Prerequisites

Installation

How it Works

Acknowledgements

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Braille OCR

Features

Architecture

Tech Stack

Machine Learning Implementation

Optical Braille Recognition (Detection)

Language Classification

Back-translation Logic

Getting Started

Prerequisites

Installation

How it Works

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages