Automatic Sigmatism Detection Using Deep Learning

This repository contains the code accompanying the research project “Bridging Speech Therapy and Deep Learning: Automatic Sigmatism Detection Using Mel Spectrograms.”

The project investigates how deep learning models can detect sigmatism (misarticulation of sibilant sounds such as [s], [z], and [x]) from speech recordings. Multiple acoustic representations and model architectures are evaluated, with a focus on interpretable models for speech therapy support.

The best-performing model combines Mel spectrograms with an attention mechanism, achieving high detection performance and interpretable predictions via Grad-CAM.

Overview

Sigmatism is a common articulation disorder that affects speech intelligibility and often requires long-term therapy. Traditional speech therapy relies on expert feedback during supervised sessions, making effective self-practice difficult.

This project explores whether deep learning models trained on acoustic speech features can automatically detect sigmatism and provide objective feedback for pronunciation training and therapy support.

Key goals

Detect sigmatism from recorded speech
Compare different acoustic feature representations
Evaluate attention mechanisms for phoneme-focused learning
Provide interpretable predictions using visualization techniques

Repository Structure

Data and Resources

Data/
Contains datasets and related resources used for training and evaluation.

DeeplearningPaper/
Includes implementations and notes from relevant deep learning research papers referenced during the project.

graphics/
Contains visual assets such as plots, graphs, and images generated or used in the project.

old_code/
Archive of previous versions or deprecated scripts from earlier stages of development.

Data Processing and Loading

audiodataloader.py
Loads and preprocesses raw audio recordings. The script extracts individual words from full recordings and creates structured word lists with metadata, which are then used during training.

Dataloader_fixedlist.py
Script for loading datasets using a predefined list of samples. Used in train_CNN.py.

Dataloader_gradcam.py
Handles data loading tailored for Grad-CAM analysis. Used in train_gmm.py.

create_fixed_list.py
Generates a fixed list of data samples used during training.

resample_data.py
Resamples audio files in a folder to match the input requirements of the Speech-to-Text model.

Feature Extraction and Analysis

SpeechToText.py
Generates and visualizes Speech-to-Text (STT) probability heatmaps. Also implements the bimodal AUC evaluation approach.

cpp.py
Implements additional evaluation metrics used to distinguish the two classes, including CPP and FID.

Model and Training

model.py
Defines the architecture of the deep learning models used in the project.

train_CNN.py
Script for training convolutional neural networks on the prepared datasets.

train_gmm.py
Training script used by paperimplementation.py.

paperimplementation.py
Contains the implementation of the Valentini et al. baseline method used for comparison with the deep learning models.

Optimization and Augmentation

hyperparametertuning.py
Performs hyperparameter optimization using Optuna.

data_augmentation.py
Contains methods for augmenting audio data, such as adding noise or altering pitch.

Visualization and Interpretation

gradcam.py
Implements the Grad-CAM algorithm to visualize which regions of the input spectrogram influence model predictions.

plotting.py
Provides functions for generating plots and visualizations used for data analysis and result interpretation.

Configuration and Execution

config.json
Configuration file storing parameters and settings used across different scripts in the project.

jobscript.sh
Shell script used for submitting jobs to a computing cluster or managing batch processing tasks.

Installation

Clone the repository:

git clone https://github.com/ankilab/sigmatism.git
cd sigmatism

Create a Python environment:

conda create -n sigmatism python=3.10
conda activate sigmatism

Install dependencies:

pip install -r requirements.txt

Training Pipeline

The workflow consists of the following steps:

Audio preprocessing
Feature extraction
Model training
Evaluation
Model interpretation

Example Results

The best-performing model (Mel spectrogram + Gaussian attention) achieved:

Recognition rate: ~91%
AUC: ~0.966

Grad-CAM visualizations confirm that the model focuses on high-frequency regions associated with sibilant articulation, supporting the interpretability of the learned representations.

Citation

    pending

License

This project is released under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic Sigmatism Detection Using Deep Learning

Overview

Key goals

Repository Structure

Data and Resources

Data Processing and Loading

Feature Extraction and Analysis

Model and Training

Optimization and Augmentation

Visualization and Interpretation

Configuration and Execution

Installation

Training Pipeline

Example Results

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
.idea		.idea
.vscode		.vscode
Data		Data
DeeplearningPaper		DeeplearningPaper
__pycache__		__pycache__
graphics		graphics
old code		old code
.gitignore		.gitignore
Dataloader_fixedlist.py		Dataloader_fixedlist.py
Dataloader_gradcam.py		Dataloader_gradcam.py
LICENSE		LICENSE
README.md		README.md
Sigmatismus Sätze.pdf		Sigmatismus Sätze.pdf
SpeechToText.py		SpeechToText.py
audiodataloader.py		audiodataloader.py
config.json		config.json
cpp.py		cpp.py
create_fixed_list.py		create_fixed_list.py
data_augmentation.py		data_augmentation.py
gradcam.py		gradcam.py
hyperparametertuning.py		hyperparametertuning.py
jobscript.sh		jobscript.sh
model.py		model.py
paperimplementation.py		paperimplementation.py
plotting.py		plotting.py
requirements.txt		requirements.txt
resample_data.py		resample_data.py
train_CNN.py		train_CNN.py
train_gmm.py		train_gmm.py
train_logistic_reg.py		train_logistic_reg.py

Folders and files

Latest commit

History

Repository files navigation

Automatic Sigmatism Detection Using Deep Learning

Overview

Key goals

Repository Structure

Data and Resources

Data Processing and Loading

Feature Extraction and Analysis

Model and Training

Optimization and Augmentation

Visualization and Interpretation

Configuration and Execution

Installation

Training Pipeline

Example Results

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages