Audio samples for our paper: "Efficient Neural and Numerical Methods for High-Quality Online Speech Spectrogram Inversion via Gradient Theorem"
- Blog: https://aferro.dynu.net/research_engineering/real_time_spectrogram_inversion/
- Audio samples: https://andres-fr.github.io/efficientspecinv
- Git repository: https://github.com/andres-fr/efficientspecinv
This webpage provides representative audio samples for clean speech data in WAV format. Each row represents one random fragment from the Librispeech clean test split. Each column represents a model used to generate the WAV directly from the STFT magnitude spectrogram:
- Ground truth: A perfect reconstruction via inverse STFT using ground truth magnitudes and phases
- Proposed: Our proposed method with efficient first and second stage
- Prev. + Thomas: The result of applying our proposed second stage to the previously proposed CNN
- Prev + direct: The result of applying a direct solver to the previously proposed CNN
- VOCOS: "Copy-synthesis" function using the VOCOS API and pretrained model, as prescribed in the official repository
- RTISI (50 iter.): 50-iteration RTISI (implementation)
- RTISI (5 iter.): 5-iteration RTISI (implementation)
- Strided + LA: The strided variation of our proposed method, with one frame of lookahead
- Strided: The strided variation of our proposed method, without lookahead
See our paper for more details:
@inproceedings{fernandez25_interspeech,
title = {{Efficient Neural and Numerical Methods for High-Quality Online Speech Spectrogram Inversion via Gradient Theorem}},
author = {Fernandez, Andres and Azcarreta Ortiz, Juan and Bilen, Çağdaş and Monge Alvarez, Jesus},
year = {2025},
booktitle = {{Interspeech 2025}},
pages = {3449--3453},
doi = {10.21437/Interspeech.2025-439},
issn = {2958-1796},
}