Contrastive Learning from Synthetic Audio Doppelgängers

Note

Code for the ICLR 2025 paper Contrastive Learning from Synthetic Audio Doppelgängers.

By randomly perturbing the parameters of a sound synthesizer (SynthAX), we generate synthetic positive pairs with causally manipulated variations in timbre, pitch, and temporal envelopes. These variations, difficult to achieve through augmentations of existing audio, provide a rich source of contrastive information. Despite the shift to randomly generated synthetic data, our method produces strong audio representations, outperforming real data on several standard audio classification tasks.

Doppelgangers.mov

Tip

You can hear examples on our website.

Warning

The code for the evaluations from the paper will be found in a different repository (coming soon).

Installation

You can create the environment as follows

conda create -n doppelgangers python=3.10
conda activate doppelgangers
pip install -r requirements.txt

By default, we use CUDA 12.1, but you can change the requirements.

Training

Training with audio doppelgängers is simple

python train.py embedding=resnet synth=voice data.synthetic.delta=0.25 general.epochs=200

It will generate directories containing logs and outputs.

Configuration

Important

We use Hydra to configure doppelgängers. The configuration can be found in conf/config.yaml, with specific sub-configs in sub-directories of conf/.

The configs define all the parameters (e.g. embedding, synthesizer, transformations). By default, these are the ones used for the paper. The only embedding for now is ResNet, but you can choose a synth architecture and a synthconfig. This is also where you choose the transform if you train with real data. Other important parameters are the data.synthetic.delta value for the doppelgängers, the data.batch_size, whether data.apply_transform or not, the data.duration of the synthetic sounds, the data.sample_rate of the synthetic sounds, whether to use data.temporal_jitter or not, the number of data.n_layers of sounds to stack together, the number of general.epochs, or the initial random system.seed.

Acknowledgements & Citing

If you use doppelgängers in your research, please cite the following paper:

@inproceedings{cherep2024contrastive,
  title={Contrastive Learning from Synthetic Audio Doppelgängers},
  author={Cherep, Manuel and Singh, Nikhil},
  booktitle={Thirteenth International Conference on Learning Representations},
  year={2025}
}

For the synthesizer component itself, please cite SynthAX:

@conference{cherep2023synthax,
  title = {SynthAX: A Fast Modular Synthesizer in JAX},
  author = {Cherep, Manuel and Singh, Nikhil},
  booktitle = {Audio Engineering Society Convention 155},
  month = {May},
  year = {2023},
  url = {http://www.aes.org/e-lib/browse.cfm?elib=22261}
}

Manuel received the support of a fellowship from “la Caixa” Foundation (ID 100010434). The fellowship code is LCF/BQ/EU23/12010079. The authors acknowledge the MIT SuperCloud and Lincoln Laboratory Supercomputing Center for providing resources that have contributed to the research results reported within this paper.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
conf		conf
doppelgangers		doppelgangers
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Contrastive Learning from Synthetic Audio Doppelgängers

Installation

Training

Configuration

Acknowledgements & Citing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

PapayaResearch/doppelgangers

Folders and files

Latest commit

History

Repository files navigation

Contrastive Learning from Synthetic Audio Doppelgängers

Installation

Training

Configuration

Acknowledgements & Citing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages