Audio Super-Resolution : Deep Learning Approach

Author: Maxime Muhlethaler Titouan Pottier

Overview

This project implements deep learning models for audio super-resolution, upsampling low-resolution audio signals from 4 kHz to 8 kHz.
Two approaches are explored: Audio U-Net and GAN-based generation, evaluated on audio reconstruction metrics.

Key Idea

The signal is upsampled using neural network architectures:

Baseline: Simple linear interpolation for reference
U-Net: Encoder-decoder architecture with skip connections
GAN: Adversarial training with multi-band discriminator for perceptually realistic outputs

The models learn to reconstruct high-frequency content lost during downsampling.

Method

Data Pipeline

Paired 4 kHz (low-res) and 8 kHz (high-res) audio files
80/20 train/validation split
Normalization to [-1, 1] range
~2100 training samples, ~780 test samples

Audio U-Net

Encoder-decoder with skip connections
Configurable depth and channel count
Transposed convolutions for upsampling
LeakyReLU activations

GAN Architecture

Generator: U-Net-style with transposed convolutions
Discriminator: Multi-band analysis of frequency content
Loss: Binary cross-entropy (adversarial) + optional perceptual losses

Metrics

Models are evaluated using:

RMSD (Root Mean Square Deviation)
LSD (Log-Spectral Distance)
SNR (Signal-to-Noise Ratio)
STOI (Short-Time Objective Intelligibility)

Experiments

Training

Batch size: 8
Optimizer: Adam (lr=1e-4)
Device: GPU (CUDA) when available
Early results show convergence within 2 epochs for small models

Evaluation

Visual comparison: waveforms and spectrograms
Quantitative metrics on test set
Audio playback for perceptual quality assessment

Dependencies

Python ≥ 3.8
PyTorch
NumPy, SciPy, Matplotlib
pystoi for STOI metric
tqdm for progress bars

Usage

Run the main notebook:

jupyter notebook superes.ipynb

Quick start

# Load data
train_dataset = AudioSuperResDataset(
    path_4k="4k/train",
    path_8k="8k/train",
    mode='train'
)

# Train U-Net
model = AudioUNet(upscale_factor=2, base_channels=16).to(device)
history = train_model(model, train_loader, val_loader, num_epochs=50)

# Or train GAN
generator = Generator(upscale_factor=2).to(device)
discriminator = MultiBandDiscriminator().to(device)
history = train_gan(generator, discriminator, train_loader, val_loader, num_epochs=100)

File Structure

project/
├── superes.ipynb       # Main notebook
├── 4k/                 # Low-res audio
│   ├── train/
│   └── test/
├── 8k/                 # High-res audio (targets)
│   ├── train/
│   └── test/
└── README.md

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
4k		4k
8k		8k
models		models
AudioUnet.png		AudioUnet.png
NuGan.png		NuGan.png
README.md		README.md
gan.pdf		gan.pdf
losses.png		losses.png
superes.ipynb		superes.ipynb
unet.pdf		unet.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio Super-Resolution : Deep Learning Approach

Overview

Key Idea

Method

Data Pipeline

Audio U-Net

GAN Architecture

Metrics

Experiments

Training

Evaluation

Dependencies

Usage

Quick start

File Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Audio Super-Resolution : Deep Learning Approach

Overview

Key Idea

Method

Data Pipeline

Audio U-Net

GAN Architecture

Metrics

Experiments

Training

Evaluation

Dependencies

Usage

Quick start

File Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages