Skip to content

Efficient Neural and Numerical Methods for High-Quality Online Speech Spectrogram Inversion via Gradient Theorem [InterSpeech2025]

Notifications You must be signed in to change notification settings

andres-fr/efficientspecinv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Audio samples for our paper: "Efficient Neural and Numerical Methods for High-Quality Online Speech Spectrogram Inversion via Gradient Theorem"

This webpage provides representative audio samples for clean speech data in WAV format. Each row represents one random fragment from the Librispeech clean test split. Each column represents a model used to generate the WAV directly from the STFT magnitude spectrogram:

  • Ground truth: A perfect reconstruction via inverse STFT using ground truth magnitudes and phases
  • Proposed: Our proposed method with efficient first and second stage
  • Prev. + Thomas: The result of applying our proposed second stage to the previously proposed CNN
  • Prev + direct: The result of applying a direct solver to the previously proposed CNN
  • VOCOS: "Copy-synthesis" function using the VOCOS API and pretrained model, as prescribed in the official repository
  • RTISI (50 iter.): 50-iteration RTISI (implementation)
  • RTISI (5 iter.): 5-iteration RTISI (implementation)
  • Strided + LA: The strided variation of our proposed method, with one frame of lookahead
  • Strided: The strided variation of our proposed method, without lookahead

See our paper for more details:

@inproceedings{fernandez25_interspeech,
  title     = {{Efficient Neural and Numerical Methods for High-Quality Online Speech Spectrogram Inversion via Gradient Theorem}},
  author    = {Fernandez, Andres and Azcarreta Ortiz, Juan and Bilen, Çağdaş and Monge Alvarez, Jesus},
  year      = {2025},
  booktitle = {{Interspeech 2025}},
  pages     = {3449--3453},
  doi       = {10.21437/Interspeech.2025-439},
  issn      = {2958-1796},
}

About

Efficient Neural and Numerical Methods for High-Quality Online Speech Spectrogram Inversion via Gradient Theorem [InterSpeech2025]

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages