This project focuses on predicting 21 hand keypoints (x, y) from RGB images using a Convolutional Neural Network (CNN) trained on FreiHAND_pub_v2 dataset. It lays the foundation for real-time gesture-based control systems (an Upcoming Project - HOLOCONTROL) .
Source: FreiHAND dataset Type: RGB images with annotated 2D keypoints Usage: We use the public subset containing ~32,000 images with corresponding joint labels.
-
Images:
- Total: 32,560
- Size: Resized to 128x128
- Format:
.jpg
-
Keypoints:
- 21 keypoints per hand, flattened to 42 values (x1, y1, x2, y2, ..., x21, y21)
- Normalized in the range
[0, 1]using image width and height
Built using PyTorch:
Input: (3, 128, 128)
β Conv2D(3 β 32) + ReLU + MaxPool2d
β Conv2D(32 β 64) + ReLU + MaxPool2d
β Conv2D(64 β 128) + ReLU + MaxPool2d
β Conv2D(128 β 256) + ReLU + MaxPool2d
β Flatten
β Linear(256*8*8 β 512) + ReLU
β Linear(512 β 42)
- Output: 42 values representing (x, y) coordinates of 21 keypoints
- Loss Function: Mean Squared Error (MSE)
- Optimizer: Adam (learning rate = 1e-4)
- Batch Size: 32
- Early Stopping: Patience of 5 epochs
- Best Model Saved as:
best_hand_model.pth