A real-time American Sign Language (ASL) recognition system that uses computer vision and deep learning to translate hand gestures into text. The system supports 24 ASL letters (A-Y, excluding J and Z) and includes a "NOTHING" class for non-gesture frames.
- Real-time Recognition: Live webcam-based sign language recognition with smooth prediction display
- Multi-Model Architecture: Implements both CNN (Keras/TensorFlow) and Transformer-based (PyTorch) models
- Advanced Hand Detection: Uses MediaPipe for robust hand landmark extraction
- Multiple Preprocessing Techniques: Various image enhancement methods for better accuracy
- Comprehensive Testing: Batch testing capabilities with confidence scoring
- Model Visualization: Training curves and accuracy metrics visualization
- Python 3.8+
- OpenCV - Computer vision and image processing
- MediaPipe - Hand landmark detection and tracking
- PyTorch - Deep learning framework for transformer model
- TensorFlow/Keras - CNN model implementation
- NumPy - Numerical computations
- Matplotlib - Visualization and plotting
- Pandas - Data manipulation
- Scikit-learn - Data preprocessing utilities
The CNN model processes 28x28 grayscale images with the following architecture:
- Conv2D (128 filters, 5x5 kernel) + MaxPool2D
- Conv2D (64 filters, 2x2 kernel) + MaxPool2D
- Conv2D (32 filters, 2x2 kernel) + MaxPool2D
- Flatten
- Dense (512 units) + Dropout (0.25)
- Dense (24 units, softmax) - Output layer
Advanced transformer-based architecture using hand landmarks:
- Input Projection (3D landmarks → 256D)
- Multi-Head Attention (4 heads)
- Feed-Forward Networks with Dropout
- Layer Normalization
- Global Average Pooling
- Classification Head (24 classes)
-
Clone the repository:
git clone https://github.com/your-username/SignLanguageRecognition.git cd SignLanguageRecognition -
Create a virtual environment:
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Additional PyTorch dependencies:
pip install torch torchvision torchaudio
The system works with ASL letter datasets containing:
- Training Data: Images organized in folders (A-Y, excluding J and Z)
- Test Data: Individual test images for validation
- Format: JPG/PNG images with hand gestures
dataset/
├── train_set/
│ ├── A/
│ ├── B/
│ └── ...
└── test_set/
├── A_test.jpg
├── B_test.jpg
└── ...
cd CNN
python model.pycd scripts
python train_model.py- Epochs: 100 (with early stopping)
- Batch Size: 64
- Learning Rate: 0.0005
- Optimizer: AdamW with weight decay
- Validation Split: 80/20
Run the real-time recognition system:
cd scripts
python real_time_test.py- Press 'q' to quit the application
- Webcam: Uses default camera (index 0)
- Resolution: 1280x720 for optimal performance
The system employs multiple preprocessing techniques:
-
Image Enhancement:
- Bilateral filtering for noise reduction
- CLAHE (Contrast Limited Adaptive Histogram Equalization)
- Gaussian blur for smoothing
-
Hand Detection:
- Multiple MediaPipe configurations
- Different confidence thresholds
- Various image preprocessing variations
-
Data Augmentation:
- Random brightness/contrast adjustment
- Small rotation angles (-10° to +10°)
- Enhanced preprocessing pipeline
cd scripts
python test_model.pycd scripts
python evaluate_model.pyThe system provides comprehensive evaluation metrics:
- Accuracy: Overall classification accuracy
- Confidence Scores: Prediction confidence levels
- Training Curves: Loss and accuracy visualization
- Per-Class Performance: Individual letter recognition rates
- Training Accuracy: >95%
- Validation Accuracy: >90%
- Real-time Performance: 10+ FPS
Sign-Language-Recognition/
├── CNN/ # CNN model implementation
│ ├── model.py # Keras CNN model
│ └── cnn_data.zip # CNN training data
├── scripts/ # Main application scripts
│ ├── train_model.py # PyTorch model training
│ ├── real_time_test.py # Real-time recognition
│ ├── test_model.py # Batch testing
│ ├── evaluate_model.py # Model evaluation
│ ├── dataconvertor.py # Data preprocessing
│ └── load_files.py # Utility functions
├── dataset/ # Training and test data
│ ├── train_set/ # Training images
│ ├── test_set/ # Test images
│ └── landmarks_dataset.csv # Processed landmark data
├── models/ # Saved model files
│ ├── sign_language_model.pth # Trained PyTorch model
│ └── landmarks_dataset.npz # Processed dataset
├── images/ # Visualization outputs
│ ├── model_accuracy.png # Training accuracy curves
│ ├── model_loss.png # Training loss curves
│ └── training_curves.png # Combined training metrics
├── docs/ # Documentation
│ └── README.pdf # Detailed documentation
├── requirements.txt # Python dependencies
├── .gitignore # Git ignore rules
└── README.md # This file
The system recognizes 24 ASL letters:
A, B, C, D, E, F, G, H, I, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y
Note: Letters J and Z are excluded as they require motion for proper recognition.
- Motion Recognition: Add support for dynamic gestures (J, Z)
- Multi-Hand Support: Recognition of two-handed signs
- Sentence Recognition: Complete word and phrase recognition
- Mobile Deployment: iOS/Android app development
- Improved Accuracy: Advanced data augmentation and model architectures
- Real-time Translation: Text-to-speech integration
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- MediaPipe team for the excellent hand tracking solution
- PyTorch and TensorFlow communities for deep learning frameworks
- OpenCV contributors for computer vision tools
Author: Rehaan Khatri Email: rehaankh7@gmail.com GitHub: @rk-python5
For detailed technical documentation, please refer to the README.pdf file in the docs directory.