Production-ready platform for training custom wakeword detection models with GPU acceleration, advanced optimizations, and a modern web interface. Features enterprise-grade Distributed Cascade Architecture for real-time deployment.
🚀 Current Version: v4.0 - Production Release
🔧 GPU Support: CUDA 11.8+ with Mixed Precision
🌐 Deployment: ONNX, TensorFlow Lite, Raspberry Pi
| 📖 Documentation | 🔧 Configuration | 🎯 Usage |
|---|---|---|
| 📘 Complete Guide | ⚙️ Presets | 🚀 Quick Start |
| User Guide & Reference | GPU/RPi Optimization | Training & Deployment |
🔍 Need help? Check our Technical Features Guide for CMVN, EMA, and FAH metrics.
- Python: 3.10+
- CUDA: 11.8+ (for GPU acceleration)
- GPU: NVIDIA GPU with 6GB+ VRAM recommended
-
Clone the Repository
git clone https://github.com/sarpel/wakeword-training-platform.git cd wakeword-training-platform -
Install Dependencies
pip install -r requirements.txt
Note: For PyTorch with CUDA 11.8, see DOCUMENTATION.md.
-
Launch the Application
python run.py
The application will open at
http://localhost:7860
For a consistent environment across Windows and Linux:
-
Configure Environment
cp .env.example .env # Edit .env to set your QUANTIZATION_BACKEND (fbgemm for Win, qnnpack for Linux) -
Launch via Docker Compose
docker-compose up -d
-
Access Services
- Dashboard:
http://localhost:7860 - Inference Server:
http://localhost:8000 - Jupyter Lab:
http://localhost:8888 - TensorBoard:
http://localhost:6006
- Dashboard:
The platform expects audio files in the following structure:
data/raw/positive/: Put your wakeword audio files here (.wav, .flac, .mp3).data/raw/negative/: Put background noise and non-wakeword speech here.
The system will automatically create these directories on first run.
Production-Ready 3-Stage Pipeline for real-time wakeword detection:
| ⚡ Stage | 🎯 Purpose | 🧠 Model | 📊 Metrics |
|---|---|---|---|
| Sentry (Edge) | Always-On Detection | MobileNetV3 + QAT | <1% FNR, <0.1% Energy |
| Judge (Local) | False Positive Filtering | Wav2Vec 2.0 | >99% Accuracy |
| Teacher (Cloud) | Knowledge Distillation | Teacher-Student | 10x Faster Training |
🔬 Advanced Features: CMVN, EMA, Mixed Precision, FAH Metrics
📖 Architecture Deep Dive
- 📉 New: Focal Loss implementation for superior hard-negative handling
- ⚡ New: QAT Accuracy Recovery pipeline (FP32 baseline to INT8 fine-tuning)
- 📏 New: Model Size Insight & Platform Constraints validation for Edge deployment
- ✨ New: Advanced GPU acceleration with Mixed Precision training
- 🚀 New: Comprehensive HPO (Hyperparameter Optimization) system
- 📦 New: Production-ready ONNX and TFLite export
- 🎯 New: Knowledge Distillation for 10x faster edge deployment
- 🔧 New: Raspberry Pi optimized models and configs
MIT License - See LICENSE file for details
🚀 Happy Training! ⭐ Star us on GitHub!