VocalTrax

Note

Code for the NeurIPS Audio Imagination 2024 paper Articulatory Synthesis of Speech and Diverse Vocal Sounds via Optimization. You can find examples here.

Articulatory synthesis seeks to replicate the human voice by modeling the physics of the vocal apparatus, offering interpretable and controllable speech production. However, such methods often require careful hand-tuning to invert acoustic signals to their articulatory parameters. We present VocalTrax, a method which performs this inversion automatically via optimizing an accelerated vocal tract model implementation. Experiments on diverse vocal datasets show significant improvements over existing methods in out-of-domain speech reconstruction, while also revealing persistent challenges in matching natural voice quality.

Note

For our baseline comparison, we refer you to Vocal Tract Area Estimation by Gradient Descent and their code.

Installation

You can create the environment as follows

conda create -n vocaltrax python=3.10
conda activate vocaltrax
pip install -r requirements.txt

By default, we install JAX for CPU. You can find more details in the JAX documentation on using JAX with your accelerators. We also install CREPE: A Convolutional Representation for Pitch Estimation, which requires tensorflow and it's installed for CPU by default.

Running

You can resynthesize a target sound as follows

cd vocaltrax
python synthesize.py general.iters=1000 general.frame_length=1024 general.hop_length=1024 general.target=data/valentine.wav

Configuration

Important

We use Hydra to configure everything. The configuration can be found in vocaltrax/conf/config.yaml, with specific sub-configs in sub-directories of vocaltrax/conf/.

The configs define all the parameters (e.g. spectrogram, optimizer, preloss). By default, these are the ones used for the paper. This is also where you choose the target file, sample_rate, frame_length, hop_length, upsample_glottis, number of iterations iters, learning rate lr, and the initial random seed.

Acknowledgements & Citing

Please cite this work as follows:

@inproceedings{mo2024articulatory,
  title={Articulatory Synthesis of Speech and Diverse Vocal Sounds via Optimization},
  author={Mo*, Luke and Cherep*, Manuel and Singh*, Nikhil and Langford, Quinn and Maes, Patricia},
  booktitle={Audio Imagination: NeurIPS 2024 Workshop AI-Driven Speech, Music, and Sound Generation}
}

MC received the support of a fellowship from “la Caixa” Foundation (ID 100010434). The fellowship code is LCF/BQ/EU23/12010079.

This codebase inherited a significant part from Vocal Tract Area Estimation by Gradient Descent, for which we are thankful. They, in turn, adapted code from other projects such as Pink Trombone, which we adapt transitively.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
vocaltrax		vocaltrax
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VocalTrax

Installation

Running

Configuration

Acknowledgements & Citing

About

Uh oh!

Releases

Packages

Languages

License

PapayaResearch/vocaltrax

Folders and files

Latest commit

History

Repository files navigation

VocalTrax

Installation

Running

Configuration

Acknowledgements & Citing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages