Skip to content

endernoke/braille-ocr

Repository files navigation

Braille OCR

2026-03-10.09-54-39.mp4

A deep-learning pipeline + webapp that converts images of Braille text into readable print text. The system supports automatic language detection and transcription for English (UEB) and Cantonese Braille.

Features

  • Optical Braille Recognition: Detects Braille cells in images and converts them to Unicode Braille patterns.
  • Multi-Language Support:
    • English (Unified English Braille Grade 1 and 2)
    • Cantonese (Hong Kong)
  • Auto-Classification: Automatically determines the Braille language using a lightweight neural network.
  • Fast Inference: Optimized for consumer CPUs, achieving sub-second inference times.
  • Visual Feedback: Returns the original image with bounding boxes drawn around detected Braille cells.
  • Responsive UI: Mobile-friendly interface for capturing photos directly from devices.

Architecture

The project is containerized using Docker and consists of the following services:

  • Frontend: React application built with TypeScript and Shadcn. Served via Nginx.
  • Backend API: FastAPI service that handles image uploads, input validation, and job queuing.
  • Worker: Celery worker that executes the ML pipeline (OCR, classification, back-translation).
  • Redis: Message broker and result backend for the task queue.

Tech Stack

  • ML/AI: PyTorch, Ultralytics (YOLO), OpenCV
  • Backend: Python, FastAPI, Celery, Redis
  • Frontend: React, Shadcn UI, Tailwind CSS
  • Infrastructure: Docker Compose

Machine Learning Implementation

Optical Braille Recognition (Detection)

The object detection component references the methodology described in:

Ovodov, Ilya G. "Optical Braille Recognition Using Object Detection Neural Network." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.

Dataset and Training: The model was trained using the Angelina Dataset (created by Ovodov) and the DSBI Dataset. To address extreme class imbalance observed in the Angelina dataset, where some cells had significantly fewer instances than others, data augmentation techniques were applied, including image flipping and Braille cell class reassignment, as described in the paper.

Model Selection: While the reference paper utilized a RetinaNet architecture, this project implements a YOLO (You Only Look Once) model. This architectural change significantly improved performance, enabling sub-second inference times on consumer CPUs compared to the 10+ seconds often required by RetinaNet on non-CUDA hardware, while maintaining comparable accuracy.

Language Classification

Language classification is handled by a custom 1D Convolutional Neural Network (CNN).

  • Input: The first 20 characters of the transcribed Braille Unicode string.
  • Training: Trained on a synthetic dataset generated by programmatically transcribing conventional text datasets into Braille using liblouis.
  • Performance: The model is extremely lightweight and efficient for distinguishing between English and Cantonese patterns.

Back-translation Logic

Once Braille cells are detected, they are translated into print text:

  1. English: Uses liblouis, an open-source braille translator, to handle UEB Grade 1 and 2 contractions.
  2. Cantonese: Uses a custom pipeline.
    • Cantonese Braille relies on phonetic pronunciation rather than character mapping, leading to homophone ambiguities that standard translators like liblouis cannot resolve contextually.
    • This system first converts Braille to Jyutping (Romanization).
    • The Jyutping is then processed by jyutping2characters, a library developed for this project that maps phonetic sequences to likely Chinese characters using frequency analysis and mapping tables.

Getting Started

Prerequisites

  • Docker
  • Model weights (OCR and Classifier) which will be released in the near future

Installation

  1. Clone the repository:

    git clone https://github.com/endernoke/braille-ocr.git
    cd braille-ocr
  2. Place the trained model weights:

    • backend/models/ocr_model.pt
    • backend/models/classifier_model.pt

    Note: Training code and weights are not currently in the repo. Please contact the author for access.

  3. Start the application (GPU mode):

    docker compose up --build

    Or for CPU-only environments:

    docker compose -f docker-compose.yml -f docker-compose.cpu.yml up --build
  4. Access the application:

    • Frontend: http://localhost:3000
    • API Documentation: http://localhost:8000/docs

How it Works

  1. Upload: User uploads an image via the frontend.
  2. Queue: The API validates the image and pushes a task to Redis.
  3. Processing (Worker):
    • Pre-processing: Image normalization.
    • Inference: YOLO model detects Braille cells and coordinates.
    • Post-processing: Detected boxes are sorted into lines and converted to Braille Unicode.
    • Classification: If no language is specified, the 1D CNN predicts the language.
    • Translation: The specific back-translation logic (Liblouis or Custom Cantonese) converts Braille to text.
  4. Result: The worker saves an annotated image and returns the text and confidence scores to the frontend.

Acknowledgements

  • Ilya G. Ovodov for the research on Optical Braille Recognition using object detection.
  • Liblouis project for the open-source braille translator.
  • Ultralytics for the YOLO implementation.
  • Angelina, DSBI, MartinThoma/wili_2018, jed351/Traditional-Chinese-Common-Crawl-Filtered, and HuggingFaceFW/fineweb dataset creators for the training data.

About

Deep-learning pipeline that converts images of Braille text into readable print text (English / Cantonese)

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors