Wakeword Training Platform

Production-ready platform for training custom wakeword detection models with GPU acceleration, advanced optimizations, and a modern web interface. Features enterprise-grade Distributed Cascade Architecture for real-time deployment.

🚀 Current Version: v4.0 - Production Release
🔧 GPU Support: CUDA 11.8+ with Mixed Precision
🌐 Deployment: ONNX, TensorFlow Lite, Raspberry Pi

📚 Quick Navigation

📖 Documentation	🔧 Configuration	🎯 Usage
📘 Complete Guide	⚙️ Presets	🚀 Quick Start
User Guide & Reference	GPU/RPi Optimization	Training & Deployment

🔍 Need help? Check our Technical Features Guide for CMVN, EMA, and FAH metrics.

🚀 Quick Start

Prerequisites

Python: 3.10+
CUDA: 11.8+ (for GPU acceleration)
GPU: NVIDIA GPU with 6GB+ VRAM recommended

Installation

Clone the Repository

git clone https://github.com/sarpel/wakeword-training-platform.git
cd wakeword-training-platform

Install Dependencies
```
pip install -r requirements.txt
```
Note: For PyTorch with CUDA 11.8, see DOCUMENTATION.md.
Launch the Application
```
python run.py
```
The application will open at http://localhost:7860

🚀 Quick Start (Docker - Recommended)

For a consistent environment across Windows and Linux:

Configure Environment

cp .env.example .env
# Edit .env to set your QUANTIZATION_BACKEND (fbgemm for Win, qnnpack for Linux)

Launch via Docker Compose
```
docker-compose up -d
```
Access Services
- Dashboard: http://localhost:7860
- Inference Server: http://localhost:8000
- Jupyter Lab: http://localhost:8888
- TensorBoard: http://localhost:6006

📂 Data Preparation

The platform expects audio files in the following structure:

data/raw/positive/: Put your wakeword audio files here (.wav, .flac, .mp3).
data/raw/negative/: Put background noise and non-wakeword speech here.

The system will automatically create these directories on first run.

🏗️ Distributed Cascade Architecture

Production-Ready 3-Stage Pipeline for real-time wakeword detection:

⚡ Stage	🎯 Purpose	🧠 Model	📊 Metrics
Sentry (Edge)	Always-On Detection	MobileNetV3 + QAT	<1% FNR, <0.1% Energy
Judge (Local)	False Positive Filtering	Wav2Vec 2.0	>99% Accuracy
Teacher (Cloud)	Knowledge Distillation	Teacher-Student	10x Faster Training

🔬 Advanced Features: CMVN, EMA, Mixed Precision, FAH Metrics
📖 Architecture Deep Dive

What's New in v4.0

📉 New: Focal Loss implementation for superior hard-negative handling
⚡ New: QAT Accuracy Recovery pipeline (FP32 baseline to INT8 fine-tuning)
📏 New: Model Size Insight & Platform Constraints validation for Edge deployment
✨ New: Advanced GPU acceleration with Mixed Precision training
🚀 New: Comprehensive HPO (Hyperparameter Optimization) system
📦 New: Production-ready ONNX and TFLite export
🎯 New: Knowledge Distillation for 10x faster edge deployment
🔧 New: Raspberry Pi optimized models and configs

📄 License

MIT License - See LICENSE file for details

🚀 Happy Training! ⭐ Star us on GitHub!

Name		Name	Last commit message	Last commit date
Latest commit History 279 Commits
.github		.github
.serena		.serena
conductor		conductor
docs		docs
scripts		scripts
server		server
src		src
tests		tests
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.secrets.baseline		.secrets.baseline
CLAUDE.md		CLAUDE.md
Colab_Training_Platform.ipynb		Colab_Training_Platform.ipynb
Dockerfile		Dockerfile
GEMINI.md		GEMINI.md
GUIDE.md		GUIDE.md
Jupyter_Quickstart.ipynb		Jupyter_Quickstart.ipynb
Makefile		Makefile
README.md		README.md
dataset-csv-creator.py		dataset-csv-creator.py
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
pr16_files.txt		pr16_files.txt
pr17_files.txt		pr17_files.txt
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
run.py		run.py
setup.py		setup.py
start_app.bat		start_app.bat
start_app.sh		start_app.sh
training-153epoch-f1-0.83.txt		training-153epoch-f1-0.83.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Wakeword Training Platform

📚 Quick Navigation

🚀 Quick Start

Prerequisites

Installation

🚀 Quick Start (Docker - Recommended)

📂 Data Preparation

🏗️ Distributed Cascade Architecture

What's New in v4.0

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

sarpel/wakeword-training-platform

Folders and files

Latest commit

History

Repository files navigation

Wakeword Training Platform

📚 Quick Navigation

🚀 Quick Start

Prerequisites

Installation

🚀 Quick Start (Docker - Recommended)

📂 Data Preparation

🏗️ Distributed Cascade Architecture

What's New in v4.0

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages