A React app that reads words aloud from a camera feed, controlled by hand gestures from legally blind users.
Based on MediaPipe, OCR, and custom gesture detection, this tool empowers visually impaired users to access visual text in their environment.
- Real-time camera capture and processing
- Hand gesture recognition (e.g. “point left”, “O”, open palm)
- OCR (text recognition) on camera frames
- Text-to-speech output to read recognized words aloud
- Lightweight fallback and buffering to avoid flicker errors
| Component | Responsibility |
|---|---|
| Frontend (React / Next.js / “use client”) | Captures video, draws landmarks, sends gestures |
| Gesture Recognizer (MediaPipe Tasks–Vision) | Detects hand landmarks & base gesture categories |
| Custom Gesture Overrides | Rules-based detection for “O”, “point left”, etc. |
| Stable Gesture Buffering | Avoids flicker by requiring consistent predictions |
| Keypress Simulation | Emits synthetic key events mapped to gestures |
| Backend / OCR / TTS (Flask or similar) | Processes camera frames, runs OCR, reads text aloud |
- Clone the repo
git clone https://github.com/groffbo/sight-to-speech.git cd sight-to-speech - Clone the repo
python3 -m venv venv source venv/bin/activate # On macOS/Linux venv\Scripts\activate # On Windows PowerShell
- Clone the repo
pip install --upgrade pip pip install -r requirements.txt
- Clone the repo
python3 app.py
- Clone the repo
or
npm install npm run dev
yarn devdepending on your setup. - Clone the repo
Go tohttp://localhost:3000(or whatever port your frontend uses). Allow camera access.
| Gestures | Descriptions / Use |
|---|---|
Open_Palm |
Start |
Closed_Fist |
Description |
Pointing_Up |
Tab |
Pointing_Left |
Backwards Tab |