Skip to content

DYUTIMAN03/SignTranslator_cam

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤟 Webcam Sign Language Translator

Live Deployment

👉 Try the live application here!

A real-time ASL (American Sign Language) translator that uses your webcam to detect hand gestures and convert them into text — and speech.

Demo


✨ Features

Feature Status
Real-time hand landmark detection (MediaPipe)
ASL A–Z letter classification (MLP)
Letter → Word → Sentence builder
Confidence score bar
Text-to-Speech output
Session history log + export
Reverse mode (text → ASL GIFs)
Accessibility mode (high contrast)
Mobile responsive
Screenshot / download

🗂️ Project Structure

SignLanguage/
├── backend/
│   ├── main.py               # FastAPI server
│   ├── mediapipe_utils.py    # Hand landmark extraction
│   ├── inference.py          # ML prediction logic
│   └── model/
│       ├── train.py          # Training script
│       ├── label_map.json    # Class → letter map
│       └── classifier.pkl    # Trained model (after training)
├── frontend/
│   ├── index.html            # UI
│   ├── style.css             # Dark-mode design
│   ├── app.js                # Webcam + prediction logic
│   ├── tts.js                # Text-to-Speech
│   └── reverse.js            # Text → ASL GIF mode
├── notebooks/
│   └── EDA_and_Training.ipynb
├── dataset/                  # Place Kaggle dataset here
├── requirements.txt
└── README.md

⚡ Quick Start

1. Prerequisites

  • Python 3.10 or 3.11 (MediaPipe requirement)
  • A working webcam

2. Install Dependencies

# Create virtual environment
python -m venv venv
.\venv\Scripts\Activate.ps1

# Install packages
pip install -r requirements.txt

3. Download the Dataset (for training)

Download the ASL Alphabet dataset from Kaggle:

Or via Kaggle API:

pip install kaggle
# Place your kaggle.json in ~/.kaggle/
kaggle datasets download -d grassknoted/asl-alphabet
Expand-Archive asl-alphabet.zip -DestinationPath dataset/

4. Train the Model

python backend/model/train.py

This scans the dataset, extracts MediaPipe landmarks, trains an MLP classifier, and saves backend/model/classifier.pkl.
Training takes ~2–5 minutes on CPU.

5. Start the Backend

uvicorn backend.main:app --reload --port 8000

6. Open the Frontend

# In a new terminal, serve the frontend:
python -m http.server 3000 --directory frontend

Then open: http://localhost:3000 in your browser.


🎮 Usage

  1. Allow webcam access when prompted
  2. Show your hand and sign any ASL letter (A–Z)
  3. Hold the gesture for ~1 second to confirm the letter
  4. Watch letters build into words in the word panel
  5. Press Spacebar to complete a word
  6. Press Enter to complete a sentence and hear it spoken aloud
  7. Switch to Reverse Mode to type text and see ASL GIFs

📊 Model Performance

Metric Value
Model scikit-learn MLP (3 hidden layers)
Features 63 MediaPipe hand landmarks
Training accuracy ~99%
Test accuracy ~95–98%
Inference time <5ms per frame
Real-time FPS ~10–15 FPS

🔬 ML Architecture

Webcam Frame
    ↓
MediaPipe Hands (21 landmarks × 3 coords = 63 features)
    ↓
Normalize & flatten landmark vector
    ↓
MLP Classifier (hidden: 256→128→64, ReLU, Adam)
    ↓
Top-3 predictions + confidence scores

🛠️ API Endpoints

Endpoint Method Description
/health GET Health check + model status
/predict POST Send base64 JPEG → get predictions
/model/info GET Model metadata

Example /predict Request

POST /predict
{
  "image": "<base64-encoded JPEG string>"
}

Example Response

{
  "predictions": [
    {"label": "A", "confidence": 0.97},
    {"label": "S", "confidence": 0.02},
    {"label": "T", "confidence": 0.01}
  ],
  "annotated_frame": "<base64-encoded annotated JPEG>",
  "hand_detected": true,
  "landmark_count": 21
}

📦 Dependencies

Package Purpose
mediapipe Hand landmark detection
opencv-python Image processing
scikit-learn MLP classifier
fastapi + uvicorn REST API server
tensorflow CNN training (notebook)
gtts Server-side TTS fallback

🔭 Future Scope

  • Full sentence-level NLP post-processing
  • LSTM model for dynamic gestures (J, Z)
  • Real-time video call plugin (Zoom/Meet)
  • Indian Sign Language (ISL) support
  • Federated learning for privacy

👨‍🎓 Academic Submission

Built for academic credits — demonstrates real-world accessibility impact, computer vision, and ML model deployment.

About

A real-time American Sign Language (ASL) translator web application. Uses standard webcams and Machine Learning (MediaPipe + Scikit-Learn MLP) to instantly detect hand gestures and translate them into text and speech.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors