Note
Code for the ICLR 2025 paper Contrastive Learning from Synthetic Audio Doppelgängers.
By randomly perturbing the parameters of a sound synthesizer (SynthAX), we generate synthetic positive pairs with causally manipulated variations in timbre, pitch, and temporal envelopes. These variations, difficult to achieve through augmentations of existing audio, provide a rich source of contrastive information. Despite the shift to randomly generated synthetic data, our method produces strong audio representations, outperforming real data on several standard audio classification tasks.
Doppelgangers.mov
Tip
You can hear examples on our website.
Warning
The code for the evaluations from the paper will be found in a different repository (coming soon).
You can create the environment as follows
conda create -n doppelgangers python=3.10
conda activate doppelgangers
pip install -r requirements.txtBy default, we use CUDA 12.1, but you can change the requirements.
Training with audio doppelgängers is simple
python train.py embedding=resnet synth=voice data.synthetic.delta=0.25 general.epochs=200It will generate directories containing logs and outputs.
Important
We use Hydra to configure doppelgängers. The configuration can be found in conf/config.yaml, with specific sub-configs in sub-directories of conf/.
The configs define all the parameters (e.g. embedding, synthesizer, transformations). By default, these are the ones used for the paper. The only embedding for now is ResNet, but you can choose a synth architecture and a synthconfig. This is also where you choose the transform if you train with real data. Other important parameters are the data.synthetic.delta value for the doppelgängers, the data.batch_size, whether data.apply_transform or not, the data.duration of the synthetic sounds, the data.sample_rate of the synthetic sounds, whether to use data.temporal_jitter or not, the number of data.n_layers of sounds to stack together, the number of general.epochs, or the initial random system.seed.
If you use doppelgängers in your research, please cite the following paper:
@inproceedings{cherep2024contrastive,
title={Contrastive Learning from Synthetic Audio Doppelgängers},
author={Cherep, Manuel and Singh, Nikhil},
booktitle={Thirteenth International Conference on Learning Representations},
year={2025}
}For the synthesizer component itself, please cite SynthAX:
@conference{cherep2023synthax,
title = {SynthAX: A Fast Modular Synthesizer in JAX},
author = {Cherep, Manuel and Singh, Nikhil},
booktitle = {Audio Engineering Society Convention 155},
month = {May},
year = {2023},
url = {http://www.aes.org/e-lib/browse.cfm?elib=22261}
}Manuel received the support of a fellowship from “la Caixa” Foundation (ID 100010434). The fellowship code is LCF/BQ/EU23/12010079. The authors acknowledge the MIT SuperCloud and Lincoln Laboratory Supercomputing Center for providing resources that have contributed to the research results reported within this paper.