Audio Feature Extraction from Scratch

Overview

This project demonstrates how raw speech audio is transformed into meaningful features used in Speech and Audio AI systems.

The goal is to understand the signal processing pipeline, not just to use existing libraries.

Signal Processing Pipeline

Raw waveform (time domain)
Fast Fourier Transform (frequency domain)
Spectrogram (time–frequency representation)
Mel Spectrogram (human auditory scale)
MFCCs (compact speech representation)

What is FFT?

FFT (Fast Fourier Transform) converts a time-domain signal into its frequency components. Speech consists of multiple frequencies produced by vocal cords and vocal tract resonances.

Why Mel Scale?

Human hearing is nonlinear. We perceive low frequencies more precisely than high frequencies. The Mel scale models this perception, making features more meaningful for speech models.

Why MFCCs are useful for Speech?

MFCCs represent the spectral envelope of speech, which contains phonetic information. They are compact, robust to noise, and widely used in speech recognition and emotion detection.

Applications

Automatic Speech Recognition (ASR)
Speech Emotion Recognition
Speaker Identification
Healthcare and Assistive Technologies

Author

Shivanshu Pal
MSc Data Science
Aspiring PhD Researcher — Speech & Audio AI
Email: contactshiva7@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
notebooks		notebooks
scripts		scripts
src		src
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio Feature Extraction from Scratch

Overview

Signal Processing Pipeline

What is FFT?

Why Mel Scale?

Why MFCCs are useful for Speech?

Applications

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Audio Feature Extraction from Scratch

Overview

Signal Processing Pipeline

What is FFT?

Why Mel Scale?

Why MFCCs are useful for Speech?

Applications

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages