Skip to content

Shivanshu-DataNerd/speech-ssl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speech SSL

Self-Supervised Speech Representation Learning using raw waveform input and Transformer-based contextual modeling.


Overview

This project explores learning meaningful speech representations without using labeled transcripts. The system is designed to operate directly on raw audio waveforms and leverage deep neural networks for feature extraction and contextual modeling.

The implementation focuses on:

  • Raw waveform processing
  • CNN-based feature encoding
  • Transformer context modeling
  • Masked contrastive learning

Project Structure

speech-ssl/
├── src/
├── notebooks/
├── data/
├── graphs/
├── checkpoints/
├── scripts/
├── tests/
├── requirements.txt
└── README.md


Setup

git clone <repo-url>
cd speech-ssl
python -m venv venv
venv\Scripts\activate  # Windows
pip install -r requirements.txt

Status

Project initialization complete.
Model development in progress.


Author

Shivanshu Pal
MSc Data Science
Focus: Speech & Audio AI

About

This repository implements a research-oriented self-supervised speech representation learning framework inspired by modern architectures such as wav2vec 2.0. This project learns directly from raw audio waveforms using convolutional neural networks and Transformer-based contextual modeling.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors