Skip to content

nina-goes/PhonemeCVAE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

PhonemeCVAE

PhonemeCVAE: Contrastive Latent Clustering and Class-Conditioned Priors for Disentangled Phoneme Interpolation

Abstract

In order to achieve precise pronunciation control, high-quality TTS systems rely on structured and disentangled speech representations. We propose PhonemeCVAE, a phoneme-conditioned variational autoencoder that learns a structured continuous latent space for phoneme-level representations. The model introduces class-conditioned Gaussian priors for each phoneme and employs a contrastive objective to promote compact intra-class clustering and clear phoneme class disentanglement. The resulting latent space enables controllable phoneme modifications via interpolation at inference time, allowing smooth transitions between phonological classes and semantically meaningful modifications of synthesized speech. Furthermore, the combination of contrastive regularization and phoneme-conditioned priors forms a structured latent topology in the VAE that generalizes across English speech datasets, supporting consistent phoneme interpolation without compromising synthesis quality.

Audio Samples

Interpolate audio samples between the centroids of two specifically selected phoneme classes, where the interpolation factor α controls the weighting between the source and the target class centroid, with α=0.0 sampling from the source class and α=1.0 sampling from the target classs.

Code

The code will be released after the paper is accepted.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors