This is a modular Python package to perform music source separation experiments using denoising diffusion probabilistic models (DDPM) and latent diffusion models (LDM). It is also the companion repository of the following papers:
@inproceedings{plaja2025generating,
author = {Genís Plaja-Roglans and Yi-Hung Hung and Xavier Serra and Igor Pereira},
title = {Generating Separated Singing Vocals Using a Diffusion Model Conditioned on Music Mixtures},
booktitle = {In Proc. of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
year = {2025},
address = {Lake Tahoe, USA},
publisher = {IEEE}
}
@inproceedings{plaja2025efficient,
author = {Genís Plaja-Roglans and Yi-Hung Hung and Xavier Serra and Igor Pereira},
title = {Efficient and Fast Generative-Based Singing Voice Separation Using a Latent Diffusion Model},
booktitle = {In Proc. of the Int. Joint Conference on Neural Networks (IJCNN)},
year = {2025},
address = {Rome, Italy},
publisher = {IEEE}
}
We have created a DiffDMXInference class in diffdmx/inference.py. You may configure the inference experiment in the config/inference.yaml file, and run the inference code like this:
DEVICES=1 NUM_NODES=1 python -m diffdmx.inference
The model weights to load are specified in the config file, that will load the model and summon the classes to handle it. You may download our latest pre-trained weights from HERE (Link TBA!).
In the inference config yaml file, make sure to specify:
files_to_separate, which is the folder where the mixes to separate live. It can also be a single file, it will be separated individually.output_path, where the separated vocals are going to be stored.
We wrap up the entire diffusion system in the DiffusionSeparator class in diffdmx/model.py. All the experiments are run by configuring this class and its components (e.g. diffusion process, latent encoder and decoder, conditioner system, and more). You basically need to run the training experiment while specifying the configuration file:
DEVICES=1 NUM_NODES=1 python -m diffdmx.train experiment=path/to/config/file
See that we are running the train file, specifying the experiment config file as argument. We store the main configuration files for the experiments in config/experiment/. In this file, we update all default parameters for ours, to run our experiment. See config files there for example use cases.
In this repo we have taken, adapted, and integrated both archinetai/audio-diffusion-pytorch and archinetai/a-unet. To cite and for latest updates, please refer to the original repositories.