Skip to content

ankilab/sigmatism

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

98 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automatic Sigmatism Detection Using Deep Learning

Python PyTorch License Status

This repository contains the code accompanying the research project “Bridging Speech Therapy and Deep Learning: Automatic Sigmatism Detection Using Mel Spectrograms.”

The project investigates how deep learning models can detect sigmatism (misarticulation of sibilant sounds such as [s], [z], and [x]) from speech recordings. Multiple acoustic representations and model architectures are evaluated, with a focus on interpretable models for speech therapy support.

The best-performing model combines Mel spectrograms with an attention mechanism, achieving high detection performance and interpretable predictions via Grad-CAM.


Overview

Sigmatism is a common articulation disorder that affects speech intelligibility and often requires long-term therapy. Traditional speech therapy relies on expert feedback during supervised sessions, making effective self-practice difficult.

This project explores whether deep learning models trained on acoustic speech features can automatically detect sigmatism and provide objective feedback for pronunciation training and therapy support.

Key goals

  • Detect sigmatism from recorded speech
  • Compare different acoustic feature representations
  • Evaluate attention mechanisms for phoneme-focused learning
  • Provide interpretable predictions using visualization techniques

Repository Structure

Data and Resources

Data/
Contains datasets and related resources used for training and evaluation.

DeeplearningPaper/
Includes implementations and notes from relevant deep learning research papers referenced during the project.

graphics/
Contains visual assets such as plots, graphs, and images generated or used in the project.

old_code/
Archive of previous versions or deprecated scripts from earlier stages of development.

Data Processing and Loading

audiodataloader.py
Loads and preprocesses raw audio recordings. The script extracts individual words from full recordings and creates structured word lists with metadata, which are then used during training.

Dataloader_fixedlist.py
Script for loading datasets using a predefined list of samples. Used in train_CNN.py.

Dataloader_gradcam.py
Handles data loading tailored for Grad-CAM analysis. Used in train_gmm.py.

create_fixed_list.py
Generates a fixed list of data samples used during training.

resample_data.py
Resamples audio files in a folder to match the input requirements of the Speech-to-Text model.

Feature Extraction and Analysis

SpeechToText.py
Generates and visualizes Speech-to-Text (STT) probability heatmaps. Also implements the bimodal AUC evaluation approach.

cpp.py
Implements additional evaluation metrics used to distinguish the two classes, including CPP and FID.

Model and Training

model.py
Defines the architecture of the deep learning models used in the project.

train_CNN.py
Script for training convolutional neural networks on the prepared datasets.

train_gmm.py
Training script used by paperimplementation.py.

paperimplementation.py
Contains the implementation of the Valentini et al. baseline method used for comparison with the deep learning models.

Optimization and Augmentation

hyperparametertuning.py
Performs hyperparameter optimization using Optuna.

data_augmentation.py
Contains methods for augmenting audio data, such as adding noise or altering pitch.

Visualization and Interpretation

gradcam.py
Implements the Grad-CAM algorithm to visualize which regions of the input spectrogram influence model predictions.

plotting.py
Provides functions for generating plots and visualizations used for data analysis and result interpretation.

Configuration and Execution

config.json
Configuration file storing parameters and settings used across different scripts in the project.

jobscript.sh
Shell script used for submitting jobs to a computing cluster or managing batch processing tasks.

Installation

Clone the repository:

git clone https://github.com/ankilab/sigmatism.git
cd sigmatism

Create a Python environment:

conda create -n sigmatism python=3.10
conda activate sigmatism

Install dependencies:

pip install -r requirements.txt

Training Pipeline

The workflow consists of the following steps:

  1. Audio preprocessing
  2. Feature extraction
  3. Model training
  4. Evaluation
  5. Model interpretation

Example Results

The best-performing model (Mel spectrogram + Gaussian attention) achieved:

  • Recognition rate: ~91%
  • AUC: ~0.966

Grad-CAM visualizations confirm that the model focuses on high-frequency regions associated with sibilant articulation, supporting the interpretability of the learned representations.


Citation

    pending

License

This project is released under the MIT License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 99.8%
  • Shell 0.2%