This project focuses on extracting audio features for Speech Emotion Recognition (SER) using the librosa library in Python. The dataset comprises audio files, and features such as MFCC, ZCR, spectral roll-off, spectral flux, chroma features, and pitch are extracted for model training.
The model is used to predict emotions.
Speech Emotion Recognition involves analyzing the emotional tone conveyed through audio. This project provides tools for preprocessing audio files and extracting relevant features necessary for training machine learning models.
The following features are extracted for each audio file:
- MFCC (Mel-Frequency Cepstral Coefficients): Represents the short-term power spectrum of sound.
- ZCR (Zero Crossing Rate): Measures the rate of sign changes in the waveform.
- SRF (Spectral Roll-Off): Frequency below which a specified percentage of the total spectral energy is contained.
- Flux (Spectral Flux): Measures the change in spectral content between consecutive frames.
- Chroma Features: Represents the intensity of the 12 different pitch classes.
- Pitch: The perceived frequency of the sound.
The project requires the following Python libraries:
librosanumpypandasIPython
Install the required packages using:
pip install librosa numpy pandasThe feature extraction process involves:
- Loading audio files.
- Extracting features using the predefined feature extractor functions.
Features and corresponding labels are saved to a CSV file for further use:
df.to_csv('extracted_features.csv', index=False)For Kaggle or Jupyter Notebook environments, use:
from IPython.display import FileLink
FileLink('extracted_features.csv')The workflow is documented in the CS412_final.ipynb file, detailing step-by-step feature extraction and saving.
The main output is a CSV file containing extracted features and labels. Example format:
| MFCC | ZCR | SRF | Flux | Chroma | Pitch | Emotion |
|---|---|---|---|---|---|---|
| [...] | 0.03 | 0.85 | 0.92 | [...] | 220.5 | Happy |
CS412_final.ipynb: The main notebook containing code for feature extraction.extracted_features.csv: The generated CSV file with features and labels.Audio data model: thirdmodel.h5 is the model saved.