Author: Maxime Muhlethaler Titouan Pottier
This project implements deep learning models for audio super-resolution, upsampling low-resolution audio signals from 4 kHz to 8 kHz.
Two approaches are explored: Audio U-Net and GAN-based generation, evaluated on audio reconstruction metrics.
The signal is upsampled using neural network architectures:
- Baseline: Simple linear interpolation for reference
- U-Net: Encoder-decoder architecture with skip connections
- GAN: Adversarial training with multi-band discriminator for perceptually realistic outputs
The models learn to reconstruct high-frequency content lost during downsampling.
- Paired 4 kHz (low-res) and 8 kHz (high-res) audio files
- 80/20 train/validation split
- Normalization to [-1, 1] range
- ~2100 training samples, ~780 test samples
- Encoder-decoder with skip connections
- Configurable depth and channel count
- Transposed convolutions for upsampling
- LeakyReLU activations
- Generator: U-Net-style with transposed convolutions
- Discriminator: Multi-band analysis of frequency content
- Loss: Binary cross-entropy (adversarial) + optional perceptual losses
Models are evaluated using:
- RMSD (Root Mean Square Deviation)
- LSD (Log-Spectral Distance)
- SNR (Signal-to-Noise Ratio)
- STOI (Short-Time Objective Intelligibility)
- Batch size: 8
- Optimizer: Adam (lr=1e-4)
- Device: GPU (CUDA) when available
- Early results show convergence within 2 epochs for small models
- Visual comparison: waveforms and spectrograms
- Quantitative metrics on test set
- Audio playback for perceptual quality assessment
- Python ≥ 3.8
- PyTorch
- NumPy, SciPy, Matplotlib
pystoifor STOI metrictqdmfor progress bars
Run the main notebook:
jupyter notebook superes.ipynb# Load data
train_dataset = AudioSuperResDataset(
path_4k="4k/train",
path_8k="8k/train",
mode='train'
)
# Train U-Net
model = AudioUNet(upscale_factor=2, base_channels=16).to(device)
history = train_model(model, train_loader, val_loader, num_epochs=50)
# Or train GAN
generator = Generator(upscale_factor=2).to(device)
discriminator = MultiBandDiscriminator().to(device)
history = train_gan(generator, discriminator, train_loader, val_loader, num_epochs=100)project/
├── superes.ipynb # Main notebook
├── 4k/ # Low-res audio
│ ├── train/
│ └── test/
├── 8k/ # High-res audio (targets)
│ ├── train/
│ └── test/
└── README.md