Skip to content

A react app that orally reads processed words from a camera via hand signals from a legally blind user.

Notifications You must be signed in to change notification settings

groffbo/sight-to-speech

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SightSpeech / Sight-to-Speech

A React app that reads words aloud from a camera feed, controlled by hand gestures from legally blind users.
Based on MediaPipe, OCR, and custom gesture detection, this tool empowers visually impaired users to access visual text in their environment.


✨ Features

  • Real-time camera capture and processing
  • Hand gesture recognition (e.g. “point left”, “O”, open palm)
  • OCR (text recognition) on camera frames
  • Text-to-speech output to read recognized words aloud
  • Lightweight fallback and buffering to avoid flicker errors

🏗 Architecture & Components

Component Responsibility
Frontend (React / Next.js / “use client”) Captures video, draws landmarks, sends gestures
Gesture Recognizer (MediaPipe Tasks–Vision) Detects hand landmarks & base gesture categories
Custom Gesture Overrides Rules-based detection for “O”, “point left”, etc.
Stable Gesture Buffering Avoids flicker by requiring consistent predictions
Keypress Simulation Emits synthetic key events mapped to gestures
Backend / OCR / TTS (Flask or similar) Processes camera frames, runs OCR, reads text aloud

🛠️ Setup & Run

  1. Clone the repo
    git clone https://github.com/groffbo/sight-to-speech.git
    cd sight-to-speech
  2. Clone the repo
    python3 -m venv venv
    source venv/bin/activate   # On macOS/Linux
    venv\Scripts\activate      # On Windows PowerShell
  3. Clone the repo
    pip install --upgrade pip
    pip install -r requirements.txt
  4. Clone the repo
    python3 app.py
  5. Clone the repo
    npm install
    npm run dev
    or yarn dev depending on your setup.
  6. Clone the repo
    Go to http://localhost:3000 (or whatever port your frontend uses). Allow camera access.

🖐 Gesture Mapping (Default)

Gestures Descriptions / Use
Open_Palm Start
Closed_Fist Description
Pointing_Up Tab
Pointing_Left Backwards Tab

About

A react app that orally reads processed words from a camera via hand signals from a legally blind user.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •