Speech SSL

Self-Supervised Speech Representation Learning using raw waveform input and Transformer-based contextual modeling.

Overview

This project explores learning meaningful speech representations without using labeled transcripts. The system is designed to operate directly on raw audio waveforms and leverage deep neural networks for feature extraction and contextual modeling.

The implementation focuses on:

Raw waveform processing
CNN-based feature encoding
Transformer context modeling
Masked contrastive learning

Project Structure

speech-ssl/
├── src/
├── notebooks/
├── data/
├── graphs/
├── checkpoints/
├── scripts/
├── tests/
├── requirements.txt
└── README.md

Setup

git clone <repo-url>
cd speech-ssl
python -m venv venv
venv\Scripts\activate  # Windows
pip install -r requirements.txt

Status

Project initialization complete.
Model development in progress.

Author

Shivanshu Pal
MSc Data Science
Focus: Speech & Audio AI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech SSL

Overview

Project Structure

Setup

Status

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
configs		configs
graphs		graphs
notebooks		notebooks
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
test.py		test.py

Folders and files

Latest commit

History

Repository files navigation

Speech SSL

Overview

Project Structure

Setup

Status

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages