NanoMamba: Noise-Robust Keyword Spotting with Spectral-Aware State Space Models

Submitted to Interspeech 2026

Overview

NanoMamba is an ultra-compact keyword spotting (KWS) architecture built on Spectral-Aware State Space Models (SA-SSM) that dynamically modulates SSM dynamics based on per-band SNR estimates. SA-SSM adjusts the discretization step Delta and input matrix B in response to estimated noise conditions, enabling inherent noise adaptation without a separate denoising front-end.

Key Results

Model	Params	Clean Acc (%)	0dB Factory (%)
NanoMamba-Tiny	4,634	TBD	TBD
NanoMamba-Small	12,032	TBD	TBD
BC-ResNet-1	7,464	TBD	TBD
BC-ResNet-3	43,200	TBD	TBD
DS-CNN-S	23,756	TBD	TBD

Architecture

Input: 1s raw audio (16kHz) -> STFT (512 FFT, 160 hop) -> 40-band log-mel
SNR Estimator: noise floor from first K=5 frames, per-band SNR
SA-SSM Blocks: Delta-modulation + B-gating conditioned on SNR
Output: 12-class classifier (10 keywords + silence + unknown)

Project Structure

NanoMamba-Interspeech2026/
├── paper/                     # Interspeech 2026 LaTeX paper
│   ├── interspeech2026.tex    # Main paper (4p + refs)
│   ├── refs.bib               # References
│   ├── Interspeech.cls        # Official Interspeech 2026 style
│   └── IEEEtran.bst           # Bibliography style
├── src/                       # Source code
│   ├── nanomamba.py           # NanoMamba SA-SSM model
│   ├── train_all_models.py    # Training pipeline (9 models + noise eval)
│   └── model.py               # Baseline models (BC-ResNet, DS-CNN)
├── colab/                     # Google Colab notebooks
│   └── SmartEar_KWS_Colab.ipynb  # GPU training notebook
└── README.md

Reproducing the Paper (TASLP 2026)

For reviewers: a single-command script re-evaluates all shipped checkpoints against the numbers reported in the paper.

# smoke test (~1-2 min on GPU): clean + factory @ 0 dB, all models
python reproduce_taslp.py --quick

# full reproduction (~30-60 min on GPU): 5 noise types x 7 SNRs
python reproduce_taslp.py --full

# pick a specific condition / model
python reproduce_taslp.py --noise-type babble --snr -5
python reproduce_taslp.py --models NC-SSM,BC-ResNet-1,DS-CNN-S

# point at a pre-downloaded Google Speech Commands V2 copy
python reproduce_taslp.py --data-dir /path/to/gsc_v2

The script:

checks the Python environment (torch, torchaudio, CUDA),
auto-locates Google Speech Commands V2 under ./data/ or ~/data/ (otherwise prints download instructions),
loads the pre-trained checkpoints from checkpoints_full/,
evaluates each model on clean test and noise-mixed test audio, and
prints a measured-vs-paper table, exiting 0 when all comparable rows fall within --tol (default 2.0 %p).

Relevant flags: --data-dir, --noise-type, --snr, --full, --tol, --quick, --models, --seed. See python reproduce_taslp.py --help.

If you don't have a local GPU, use colab/NanoMamba_Test.ipynb which downloads the dataset automatically on Colab.

Quick Start (Colab)

One-click reviewer notebooks (auto-downloads Google Speech Commands V2):

NanoMamba_Test -- evaluate shipped checkpoints on clean + noise grid.
SmartEar_KWS_Colab -- full training of all 9 models + noise evaluation.
NanoMamba_Train -- canonical end-to-end training notebook.

Manual steps:

Open colab/SmartEar_KWS_Colab.ipynb in Google Colab
Select GPU runtime (T4 or better)
Run Cell 1 (setup + dataset download)
Run Cell 2 (train all 9 models + noise evaluation)
Run Cell 3 (download results)

SA-SSM Mechanism

Delta-modulation: Delta_t = softplus(W_delta * x_t + W_s * s_hat_t)

High SNR -> larger Delta -> faster state dynamics
Low SNR -> smaller Delta -> suppresses noise transients

B-gating: B_tilde = B_t * (1 - alpha + alpha * sigma(W_g * s_hat_t))

Low SNR -> gate closes -> reduces noisy input contribution
alpha (learnable, init 0.5) controls gating strength

Dataset

Google Speech Commands V2 (12-class):

10 keywords: yes, no, up, down, left, right, on, off, stop, go
2 additional: silence, unknown
86,843 train / 10,481 val / 11,005 test

Training Configuration

Optimizer: AdamW (beta1=0.9, beta2=0.999)
LR: 3e-3 (<20K params), 1e-3 (default)
Scheduler: Cosine annealing
Label smoothing: 0.1
Epochs: 30, Batch size: 64
Augmentation: time shift (+-100ms), volume (+-20%), Gaussian noise (p=0.3)

Noise Evaluation

Types: Factory, White (Gaussian), Babble (5-9 talkers)
SNR levels: {-15, -10, -5, 0, 5, 10, 15} dB
Audio-domain mixing with RMS-based scaling

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 159 Commits
ces2027-demo		ces2027-demo
checkpoints_full		checkpoints_full
colab		colab
demo		demo
docs		docs
nano-ssm		nano-ssm
ncconv		ncconv
paper		paper
patent		patent
rtl		rtl
scripts		scripts
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
NC-SSM_Vision_IR_Deck.pptx		NC-SSM_Vision_IR_Deck.pptx
NC_SSM_Vision_Colab.ipynb		NC_SSM_Vision_Colab.ipynb
NanoMamba_Train.ipynb		NanoMamba_Train.ipynb
README.md		README.md
arm_analysis.py		arm_analysis.py
benchmark_comparison.png		benchmark_comparison.png
colab_run.sh		colab_run.sh
culane_ncconv.py		culane_ncconv.py
measure_efficiency.py		measure_efficiency.py
model.py		model.py
model_landscape.png		model_landscape.png
nanomamba.py		nanomamba.py
nc_bio.py		nc_bio.py
nc_vibration.py		nc_vibration.py
ncconv_experiments.py		ncconv_experiments.py
ncssm_vision.py		ncssm_vision.py
ncssm_vision_tasks.py		ncssm_vision_tasks.py
noise_robustness.pdf		noise_robustness.pdf
noise_robustness.png		noise_robustness.png
paper_models.py		paper_models.py
param_efficiency.pdf		param_efficiency.pdf
param_efficiency.png		param_efficiency.png
plot_babble_comparison.py		plot_babble_comparison.py
plot_baseline_comparison.py		plot_baseline_comparison.py
plot_final_performance.py		plot_final_performance.py
plot_landscape.py		plot_landscape.py
plot_noise_results.py		plot_noise_results.py
plot_paper_figure.py		plot_paper_figure.py
plot_param_efficiency.py		plot_param_efficiency.py
plot_pcen_comparison.py		plot_pcen_comparison.py
plot_structural_comparison.py		plot_structural_comparison.py
profile_all.py		profile_all.py
profile_scaling.py		profile_scaling.py
proposed_model.py		proposed_model.py
reproduce_taslp.py		reproduce_taslp.py
structural_comparison.png		structural_comparison.png
train_all_models.py		train_all_models.py
train_colab.py		train_colab.py
train_nc_tcn.ipynb		train_nc_tcn.ipynb
train_nc_vibration.ipynb		train_nc_vibration.ipynb
train_vibration.py		train_vibration.py
train_vision.py		train_vision.py
training_progress.png		training_progress.png
var_y_experiment.py		var_y_experiment.py
verify_scaling.py		verify_scaling.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NanoMamba: Noise-Robust Keyword Spotting with Spectral-Aware State Space Models

Overview

Key Results

Architecture

Project Structure

Reproducing the Paper (TASLP 2026)

Quick Start (Colab)

SA-SSM Mechanism

Dataset

Training Configuration

Noise Evaluation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NanoMamba: Noise-Robust Keyword Spotting with Spectral-Aware State Space Models

Overview

Key Results

Architecture

Project Structure

Reproducing the Paper (TASLP 2026)

Quick Start (Colab)

SA-SSM Mechanism

Dataset

Training Configuration

Noise Evaluation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages