Audio samples for our paper: "Efficient Neural and Numerical Methods for High-Quality Online Speech Spectrogram Inversion via Gradient Theorem"

Blog: https://aferro.dynu.net/research_engineering/real_time_spectrogram_inversion/
Audio samples: https://andres-fr.github.io/efficientspecinv
Git repository: https://github.com/andres-fr/efficientspecinv

This webpage provides representative audio samples for clean speech data in WAV format. Each row represents one random fragment from the Librispeech clean test split. Each column represents a model used to generate the WAV directly from the STFT magnitude spectrogram:

Ground truth: A perfect reconstruction via inverse STFT using ground truth magnitudes and phases
Proposed: Our proposed method with efficient first and second stage
Prev. + Thomas: The result of applying our proposed second stage to the previously proposed CNN
Prev + direct: The result of applying a direct solver to the previously proposed CNN
VOCOS: "Copy-synthesis" function using the VOCOS API and pretrained model, as prescribed in the official repository
RTISI (50 iter.): 50-iteration RTISI (implementation)
RTISI (5 iter.): 5-iteration RTISI (implementation)
Strided + LA: The strided variation of our proposed method, with one frame of lookahead
Strided: The strided variation of our proposed method, without lookahead

See our paper for more details:

@inproceedings{fernandez25_interspeech,
  title     = {{Efficient Neural and Numerical Methods for High-Quality Online Speech Spectrogram Inversion via Gradient Theorem}},
  author    = {Fernandez, Andres and Azcarreta Ortiz, Juan and Bilen, Çağdaş and Monge Alvarez, Jesus},
  year      = {2025},
  booktitle = {{Interspeech 2025}},
  pages     = {3449--3453},
  doi       = {10.21437/Interspeech.2025-439},
  issn      = {2958-1796},
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
wav		wav
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Audio samples for our paper: "Efficient Neural and Numerical Methods for High-Quality Online Speech Spectrogram Inversion via Gradient Theorem"

About

Uh oh!

Releases

Packages

Languages

andres-fr/efficientspecinv

Folders and files

Latest commit

History

Repository files navigation

Audio samples for our paper: "Efficient Neural and Numerical Methods for High-Quality Online Speech Spectrogram Inversion via Gradient Theorem"

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages