A sophisticated webcam-based air-writing system that recognizes handwritten digits (0-9) using MediaPipe hand tracking, Kalman filtering, and a hybrid recognition approach combining structural rules with machine learning.
- Real-time hand tracking using MediaPipe Hands
- Strict Index-finger air-writing - Write digits with only your index finger raised
- Kalman filtering - Smooth, jitter-free cursor tracking
- 5-Layer Intent Filter - Eliminates noise, accidental dots, and wild swipes
- Hybrid recognition system - Combines structural validation with CNN
- Continuous stroke rendering - No gaps or broken lines
- Strict finger detection - Pen ON only when index finger is raised (all others closed)
- Fist-to-stop - Make a fist to immediately stop writing
- Temporal gating - 50ms buffer to prevent micro-jitter
- Spatial filtering - 3px minimum movement for smooth curves
- Velocity guards - Rejects movements faster than 1500 px/s
- Directional filtering - Allows up to 150ยฐ angle changes for natural writing
- Structural validation - Loop count, aspect ratio, stroke direction analysis
- Confidence-based rejection - Only accepts predictions above 60% confidence
- Visual debugging - Auto-saves processed images for inspection
- Python 3.8+
- Webcam (30 FPS recommended)
- Windows/Linux/macOS
# Clone the repository
cd HandFree
# Create virtual environment
python -m venv venv
# Activate virtual environment
# Windows:
.\venv\Scripts\activate
# Linux/Mac:
source venv/bin/activate
# Install dependencies
pip install opencv-python mediapipe tensorflow numpy scikit-learn joblibpython main.py- Raise ONLY index finger โ Pen ON (start writing)
- Thumb, middle, ring, pinky MUST be closed
- Raise any other finger โ Pen OFF (stop writing)
- Make a fist โ Force Pen OFF (explicit stop)
- Wait 0.8 seconds with pen OFF โ Auto-recognize digit
- Press 'c' โ Clear canvas
- Press 'q' โ Quit
- Keep only index finger raised - Any other finger will stop writing
- Write larger digits (use most of the screen)
- Write at natural speed - System handles smoothing automatically
- The Kalman filter smooths jitter - No need to hold perfectly still
- Follow canonical digit forms (see guide below)
The system recognizes standard digit shapes:
| Digit | Key Features | Loop Count |
|---|---|---|
| 0 | Circular/oval loop | 1 |
| 1 | Tall vertical line | 0 |
| 2 | Curved top, diagonal, flat base | 0 |
| 3 | S-curve | 0 |
| 4 | Vertical + diagonal + optional bar | 0 |
| 5 | Top horizontal bar + bottom curve | 0 |
| 6 | Loop at bottom | 1 |
| 7 | Top horizontal + diagonal | 0 |
| 8 | Two stacked loops | 2 |
| 9 | Loop at top + tail | 1 |
See Canonical_Digit_Recognition_Guide_0_to_9.md for detailed specifications.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Main Application โ
โ (main.py) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโ
โผ โผ โผ โผ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
โ Hand Tracker โ โ Stabilizer โ โCurve Rendererโ โ Hybrid โ
โ(MediaPipe) โ โ(Kalman + โ โ(Adaptive โ โ Recognizer โ
โ โ โ Intent) โ โInterpolation)โ โ(Rules + CNN) โ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
โ โ โ
โผ โผ โผ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
โ Feature โ โ Stroke โ โ Gesture โ
โ Extractor โ โ Analysis โ โ Detection โ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
main.py- Main application loop, UI, and recognition pipelinehand_tracker.py- MediaPipe hand tracking and strict finger state detectionstabilizer.py- Kalman filtering and 5-layer intent noise filteringcurve_renderer.py- Adaptive interpolation for curve-aware renderinghybrid_recognizer.py- Hybrid recognition (structural rules + CNN)feature_extractor.py- Structural feature extraction (loops, aspect ratio, etc.)stroke_analysis.py- Stroke statistics and validation
- Hand Tracking - MediaPipe detects hand and finger states
- Gesture Recognition - Strict index-only detection (Pen ON/OFF)
- Kalman Filtering - Smooth, jitter-free position tracking
- Intent Filtering - 5-layer noise suppression:
- Temporal: 50ms minimum stroke duration
- Spatial: 3px minimum movement
- Velocity: 20-1500 px/s range
- Directional: โค150ยฐ angle changes
- Structural: Minimum stroke length
- Stroke Rendering - Continuous line drawing with adaptive interpolation
- Feature Extraction - Analyze loops, aspect ratio, stroke direction
- Candidate Filtering - Eliminate structurally impossible digits
- CNN Inference - Restricted to valid candidates only
- Confidence Check - Reject predictions below 60% confidence
- Rule Verification - Final validation against digit definitions
- Accept or Reject - Conservative decision making
- Kalman Filter - Process noise: 0.03, Measurement noise: 8.0
- Intent Filters - Multi-layer noise rejection
- Continuous Strokes - No gaps between points
- Velocity Guards - Automatic spike detection and rejection
CANVAS_SIZE = 28 # 28x28 to match sklearn model
STROKE_THICKNESS = 6 # Stroke width in pixels (increased for visibility)
CONFIDENCE_THRESHOLD = 0.60 # Minimum confidence to accept# Kalman Filter
PROCESS_NOISE = 0.03 # Responsiveness
MEASUREMENT_NOISE = 8.0 # Jitter reduction
# Intent Filter
MIN_DURATION_MS = 50 # Temporal gate (reduced from 100ms)
MIN_DIST_PX = 3 # Spatial threshold (reduced from 5px)
MAX_VELOCITY = 1500 # px/s (increased from 1000)
MAX_ANGLE_CHANGE = 150 # degrees (increased from 110)DETECTION_CONFIDENCE = 0.8 # Hand detection threshold
TRACKING_CONFIDENCE = 0.8 # Hand tracking thresholdThe system auto-saves processed images:
- Location:
d:\Programming\Projects\HandFree\debug_digit_XXX.png - Size: 280x280 (scaled from 28x28 for visibility)
- Format: Binary (white on black)
๐ (640,360)โ(680,400) [d:56.6px, ฮธ:35ยฐ, pts:57] โ Curve detected
โ (680,400)โ(685,405) [d:7.1px, ฮธ:5ยฐ, pts:5] โ Line detected
============================================================
Recognition #1
============================================================
Structural Features:
Loop count: 1
Aspect ratio: 1.85
Vertical ratio: 0.68
Diagonal: False
Structural candidates: [0, 6, 9]
Method: hybrid
Recognized: 0 (confidence: 0.882)
Result: โ 0 (conf: 0.88, candidates: [0, 6, 9])
============================================================
Issue: Digits not recognized
- Solution: Write larger, slower, and follow canonical forms
Issue: Curves appear broken
- Solution: System now has adaptive interpolation - restart app
Issue: Too many rejections
- Solution: Confidence threshold lowered to 60% - should be better
Issue: Wrong digit recognized
- Solution: Check saved images - if they look correct, model may need retraining
See recognition_diagnostic.md for detailed troubleshooting.
Canonical_Digit_Recognition_Guide_0_to_9.md- Digit specificationswalkthrough.md- Complete system walkthroughhybrid_test_guide.md- Testing the hybrid systemcanonical_updates.md- Recent rule updatesrecognition_diagnostic.md- Troubleshooting guide
This system is designed for teacher-safe operation:
- โ Conservative rejection over incorrect guessing
- โ Interpretable decisions (shows why digits were accepted/rejected)
- โ Follows canonical educational digit forms
- โ Visual feedback for learning
- Loop count (0, 1, or 2)
- Stroke count
- Aspect ratio (height/width)
- Vertical motion ratio
- Horizontal motion ratio
- Diagonal presence
- Horizontal segment positions (top/middle/bottom)
- Loop position (top/middle/bottom)
- Curvature patterns (S-curve, C-curve)
- Minimum pixel density: 2%
- Minimum stroke area: 1% of canvas
- Minimum bounding box: 2% of canvas
- Minimum perimeter: 30% of canvas dimension
- CNN Model: Trained on EMNIST (printed digits), may not perfectly match air-writing style
- Lighting: Requires good lighting for hand tracking
- Camera: 30 FPS minimum recommended for smooth curves
- Hand Size: Works best with hand filling ~50% of frame
- Train CNN on air-written digits specifically
- Add letter recognition (A-Z)
- Multi-digit number recognition
- Gesture-based commands (undo, redo)
- Save/load written text
- Export to text file