A machine learning system for detecting and classifying air leaks using multi-accelerometer arrays and spectral analysis. The system uses a two-stage classification approach to first identify the optimal sensor position and then classify leak severity.
This project implements a complete ML pipeline for industrial air leak detection, capable of classifying leaks into 4 categories:
| Class | Description |
|---|---|
| NOLEAK | Normal operation, no leak detected |
| 1/16" | Small hole leak (1/16 inch) |
| 3/32" | Medium hole leak (3/32 inch) |
| 1/8" | Large hole leak (1/8 inch) |
The system achieves 100% accuracy on the test set using Random Forest and SVM classifiers with amplitude-based features extracted from accelerometer signals.
The system uses 3 single-axis accelerometers mounted at different distances along the pipe:
| Accelerometer | Position | WebDAQ Column |
|---|---|---|
| Accelerometer 0 | Closest to leak source | Acceleration 0 |
| Accelerometer 1 | Middle position | Acceleration 1 |
| Accelerometer 2 | Farthest from leak source | Acceleration 2 |
Note: These are 3 separate single-axis accelerometers at different physical positions, NOT a single 3-axis accelerometer. Each sensor measures acceleration in a single direction.
Multi-Accelerometer Array (3 sensors recording simultaneously)
│
▼
┌───────────────────────────────┐
│ Stage 1: Position Classifier │
│ Identify which accelerometer │
│ is closest to leak source │
└───────────────────────────────┘
│
▼
Position ID (0, 1, or 2)
│
▼
┌───────────────────────────────┐
│ Stage 2: Hole Size Classifier│
│ Position-specific model for │
│ leak severity classification │
└───────────────────────────────┘
│
▼
Leak Size Prediction (NOLEAK, 1/16", 3/32", 1/8")
- Signal Strength Varies with Distance: The accelerometer closest to a leak will have the strongest signal
- Position-Specific Models: Each position has unique signal characteristics that benefit from specialized classifiers
- Improved Accuracy: By first identifying the optimal sensor, the system can apply the most appropriate classification model
AirLeakDetection/
├── src/ # Main source code
│ ├── data/ # Data loading and preprocessing
│ │ ├── data_loader.py # WebDAQ CSV loading
│ │ ├── fft_processor.py # FFT/Welch PSD processing
│ │ ├── preprocessor.py # Signal preprocessing
│ │ ├── feature_extractor.py # Feature extraction
│ │ └── ...
│ ├── models/ # Model implementations
│ │ ├── two_stage_classifier.py # Main two-stage classifier
│ │ ├── random_forest.py # Random Forest (best performer)
│ │ ├── svm_classifier.py # SVM classifier
│ │ ├── cnn_1d.py # 1D CNN for FFT data
│ │ ├── lstm_model.py # LSTM model
│ │ └── ensemble_model.py # Ensemble methods
│ ├── training/ # Training pipeline
│ ├── evaluation/ # Evaluation and metrics
│ ├── prediction/ # Inference pipeline
│ └── utils/ # Utilities and configuration
├── scripts/ # Executable scripts
│ ├── train_two_stage_classifier_v2.py # Train two-stage system
│ ├── train_accelerometer_classifier.py # Train Stage 1
│ ├── extract_amplitude_features.py # Feature extraction
│ └── ...
├── tests/ # Test suite
├── docs/ # Documentation
├── config.yaml # Configuration file
└── requirements.txt # Dependencies
- Python 3.8+
- See
requirements.txtfor dependencies
# Clone repository
git clone https://github.com/ilegault/AirLeakDetection.git
cd AirLeakDetection
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtPlace raw WebDAQ CSV files in data/raw/ organized by class:
data/raw/
├── NOLEAK/
│ ├── sample1.csv
│ └── ...
├── 1_16/
├── 3_32/
└── 1_8/
# Prepare processed data
python scripts/prepare_data.py --input-dir data/raw/ --output-dir data/processed/
# Extract amplitude-based features for accelerometer classification
python scripts/extract_amplitude_features.py \
--input-dir data/processed/ \
--output-dir data/accelerometer_classifier_v2/# Train Stage 1: Accelerometer position classifier
python scripts/train_accelerometer_classifier.py \
--data-path data/accelerometer_classifier_v2/ \
--model-type random_forest
# Train Stage 2: Two-stage classifier with hole size models
python scripts/train_two_stage_classifier_v2.py \
--accelerometer-data data/accelerometer_classifier_v2/ \
--accelerometer-classifier models/accelerometer_classifier/model_*/random_forest_accelerometer.pkl \
--hole-size-data data/processed/ \
--output-dir models/two_stage_classifier_v2/# Evaluate the two-stage classifier
python src/evaluation/evaluate_two_stage_classifier_v2.py \
--config models/two_stage_classifier_v2/model_*/two_stage_config.json \
--hole-size-data data/processed/ \
--output-dir results/two_stage_classifier_v2/The system extracts amplitude-based features that capture signal strength differences between accelerometers:
| Feature | Description |
|---|---|
| RMS | Root Mean Square - overall signal strength |
| Standard Deviation | Signal variability |
| Peak Amplitude | Maximum excursion |
| Signal Energy | Total power |
| Peak-to-Peak | Maximum range |
| Crest Factor | Peak / RMS ratio |
| Kurtosis | Tail behavior (spikiness) |
| Skewness | Signal asymmetry |
| FFT Statistics | Mean, max, std of FFT magnitude |
| Band Power | Power in frequency bands (50-500Hz, 500-1500Hz, 1500-4000Hz) |
| Welch PSD | Power spectral density statistics |
| Model | Type | Best Use Case |
|---|---|---|
| Random Forest | Traditional ML | Best overall performance (100% accuracy) |
| SVM | Traditional ML | Excellent performance (100% accuracy) |
| CNN-1D | Deep Learning | FFT magnitude classification |
| LSTM | Deep Learning | Sequential signal data |
| Ensemble | Combined | Model combination strategies |
| Model | Test Accuracy |
|---|---|
| Random Forest | 100% |
| SVM | 100% |
| CNN-1D | 26% |
| LSTM | 26% |
Traditional ML models significantly outperform deep learning models on this task. The FFT and amplitude features are highly discriminative, making Random Forest and SVM ideal choices.
Configuration is managed via config.yaml:
data:
raw_data_path: "data/raw"
processed_data_path: "data/processed"
sample_rate: 17066 # WebDAQ sample rate (Hz)
duration: 10 # Recording duration (seconds)
n_channels: 3 # Number of accelerometers
preprocessing:
fft_size: 2048
window: "hanning"
freq_min: 30 # Minimum frequency (Hz)
freq_max: 2000 # Maximum frequency (Hz)
# Welch's method parameters
welch:
num_segments: 16
window_type: "hamming"
overlap_ratio: 0.5
bandpower_freq_min: 50
bandpower_freq_max: 4000
training:
batch_size: 64
epochs: 100
learning_rate: 0.001
validation_split: 0.15
test_split: 0.15
classes:
0: "NOLEAK"
1: "1_16"
2: "3_32"
3: "1_8"The system supports multiple FFT computation methods:
| Method | Description |
|---|---|
| Welch PSD | Power Spectral Density with averaging (recommended) |
| SciPy FFT | Standard FFT with windowing |
| NumPy FFT | Basic FFT implementation |
| MATLAB Import | Load pre-computed FFT from .mat files |
The following directories are excluded from Git (see .gitignore):
data/raw/- Raw accelerometer CSV filesdata/processed/- Processed NPZ filesmodels/- Trained model weightsresults/- Evaluation resultslogs/- Log files
# Run all tests
pytest tests/
# Run specific test module
pytest tests/test_evaluation.py -vAdditional documentation is available in the docs/ directory:
PHASES.md- Development phase breakdownTRAINING_GUIDE.md- Detailed training instructionsBENCHMARKING_GUIDE.md- Performance benchmarking guideWELCH_METHOD.md- Welch's method implementation detailsACCELEROMETER_SETUP.md- Sensor configuration details
MIT License - see LICENSE for details.
Isaac Legault - ilegault004@gmail.com
- FFT Processing:
src/data/fft_processor.py - WebDAQ Data Format:
src/data/data_loader.py - Two-Stage Classifier:
src/models/two_stage_classifier.py - Feature Extraction:
scripts/extract_amplitude_features.py