👉 Try the live application here!
A real-time ASL (American Sign Language) translator that uses your webcam to detect hand gestures and convert them into text — and speech.
| Feature | Status |
|---|---|
| Real-time hand landmark detection (MediaPipe) | ✅ |
| ASL A–Z letter classification (MLP) | ✅ |
| Letter → Word → Sentence builder | ✅ |
| Confidence score bar | ✅ |
| Text-to-Speech output | ✅ |
| Session history log + export | ✅ |
| Reverse mode (text → ASL GIFs) | ✅ |
| Accessibility mode (high contrast) | ✅ |
| Mobile responsive | ✅ |
| Screenshot / download | ✅ |
SignLanguage/
├── backend/
│ ├── main.py # FastAPI server
│ ├── mediapipe_utils.py # Hand landmark extraction
│ ├── inference.py # ML prediction logic
│ └── model/
│ ├── train.py # Training script
│ ├── label_map.json # Class → letter map
│ └── classifier.pkl # Trained model (after training)
├── frontend/
│ ├── index.html # UI
│ ├── style.css # Dark-mode design
│ ├── app.js # Webcam + prediction logic
│ ├── tts.js # Text-to-Speech
│ └── reverse.js # Text → ASL GIF mode
├── notebooks/
│ └── EDA_and_Training.ipynb
├── dataset/ # Place Kaggle dataset here
├── requirements.txt
└── README.md
- Python 3.10 or 3.11 (MediaPipe requirement)
- A working webcam
# Create virtual environment
python -m venv venv
.\venv\Scripts\Activate.ps1
# Install packages
pip install -r requirements.txtDownload the ASL Alphabet dataset from Kaggle:
- URL: https://www.kaggle.com/datasets/grassknoted/asl-alphabet
- Extract to:
dataset/asl_alphabet_train/(so you havedataset/asl_alphabet_train/A/,dataset/asl_alphabet_train/B/, etc.)
Or via Kaggle API:
pip install kaggle
# Place your kaggle.json in ~/.kaggle/
kaggle datasets download -d grassknoted/asl-alphabet
Expand-Archive asl-alphabet.zip -DestinationPath dataset/python backend/model/train.pyThis scans the dataset, extracts MediaPipe landmarks, trains an MLP classifier, and saves backend/model/classifier.pkl.
Training takes ~2–5 minutes on CPU.
uvicorn backend.main:app --reload --port 8000# In a new terminal, serve the frontend:
python -m http.server 3000 --directory frontendThen open: http://localhost:3000 in your browser.
- Allow webcam access when prompted
- Show your hand and sign any ASL letter (A–Z)
- Hold the gesture for ~1 second to confirm the letter
- Watch letters build into words in the word panel
- Press Spacebar to complete a word
- Press Enter to complete a sentence and hear it spoken aloud
- Switch to Reverse Mode to type text and see ASL GIFs
| Metric | Value |
|---|---|
| Model | scikit-learn MLP (3 hidden layers) |
| Features | 63 MediaPipe hand landmarks |
| Training accuracy | ~99% |
| Test accuracy | ~95–98% |
| Inference time | <5ms per frame |
| Real-time FPS | ~10–15 FPS |
Webcam Frame
↓
MediaPipe Hands (21 landmarks × 3 coords = 63 features)
↓
Normalize & flatten landmark vector
↓
MLP Classifier (hidden: 256→128→64, ReLU, Adam)
↓
Top-3 predictions + confidence scores
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check + model status |
/predict |
POST | Send base64 JPEG → get predictions |
/model/info |
GET | Model metadata |
POST /predict
{
"image": "<base64-encoded JPEG string>"
}{
"predictions": [
{"label": "A", "confidence": 0.97},
{"label": "S", "confidence": 0.02},
{"label": "T", "confidence": 0.01}
],
"annotated_frame": "<base64-encoded annotated JPEG>",
"hand_detected": true,
"landmark_count": 21
}| Package | Purpose |
|---|---|
mediapipe |
Hand landmark detection |
opencv-python |
Image processing |
scikit-learn |
MLP classifier |
fastapi + uvicorn |
REST API server |
tensorflow |
CNN training (notebook) |
gtts |
Server-side TTS fallback |
- Full sentence-level NLP post-processing
- LSTM model for dynamic gestures (J, Z)
- Real-time video call plugin (Zoom/Meet)
- Indian Sign Language (ISL) support
- Federated learning for privacy
Built for academic credits — demonstrates real-world accessibility impact, computer vision, and ML model deployment.
