Skip to content

titouanp22/Mini-Project-ML-Audio-Super-Resolution

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Audio Super-Resolution : Deep Learning Approach

Author: Maxime Muhlethaler Titouan Pottier

Overview

This project implements deep learning models for audio super-resolution, upsampling low-resolution audio signals from 4 kHz to 8 kHz.
Two approaches are explored: Audio U-Net and GAN-based generation, evaluated on audio reconstruction metrics.


Key Idea

The signal is upsampled using neural network architectures:

  • Baseline: Simple linear interpolation for reference
  • U-Net: Encoder-decoder architecture with skip connections
  • GAN: Adversarial training with multi-band discriminator for perceptually realistic outputs

The models learn to reconstruct high-frequency content lost during downsampling.


Method

Data Pipeline

  • Paired 4 kHz (low-res) and 8 kHz (high-res) audio files
  • 80/20 train/validation split
  • Normalization to [-1, 1] range
  • ~2100 training samples, ~780 test samples

Audio U-Net

  • Encoder-decoder with skip connections
  • Configurable depth and channel count
  • Transposed convolutions for upsampling
  • LeakyReLU activations

GAN Architecture

  • Generator: U-Net-style with transposed convolutions
  • Discriminator: Multi-band analysis of frequency content
  • Loss: Binary cross-entropy (adversarial) + optional perceptual losses

Metrics

Models are evaluated using:

  • RMSD (Root Mean Square Deviation)
  • LSD (Log-Spectral Distance)
  • SNR (Signal-to-Noise Ratio)
  • STOI (Short-Time Objective Intelligibility)

Experiments

Training

  • Batch size: 8
  • Optimizer: Adam (lr=1e-4)
  • Device: GPU (CUDA) when available
  • Early results show convergence within 2 epochs for small models

Evaluation

  • Visual comparison: waveforms and spectrograms
  • Quantitative metrics on test set
  • Audio playback for perceptual quality assessment

Dependencies

  • Python ≥ 3.8
  • PyTorch
  • NumPy, SciPy, Matplotlib
  • pystoi for STOI metric
  • tqdm for progress bars

Usage

Run the main notebook:

jupyter notebook superes.ipynb

Quick start

# Load data
train_dataset = AudioSuperResDataset(
    path_4k="4k/train",
    path_8k="8k/train",
    mode='train'
)

# Train U-Net
model = AudioUNet(upscale_factor=2, base_channels=16).to(device)
history = train_model(model, train_loader, val_loader, num_epochs=50)

# Or train GAN
generator = Generator(upscale_factor=2).to(device)
discriminator = MultiBandDiscriminator().to(device)
history = train_gan(generator, discriminator, train_loader, val_loader, num_epochs=100)

File Structure

project/
├── superes.ipynb       # Main notebook
├── 4k/                 # Low-res audio
│   ├── train/
│   └── test/
├── 8k/                 # High-res audio (targets)
│   ├── train/
│   └── test/
└── README.md

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors