This is the inference models of our paper MSLDM: Multi-Source Latent Diffusion for Music Generation: https://arxiv.org/abs/2409.06190. Our demo site is shown here: https://xzwy.github.io/MSLDMDemo/.
git clone https://github.com/XZWY/MSLDMThe environment for running our code can be installed using conda:
# Install environment
conda env create -f env.yaml
# Activate the environment
conda activate msldm1. download the complete dataset from https://github.com/gladia-research-group/multi-source-diffusion-models/blob/main/data/README.md
3. Follow the data preparation instruction 3,4 in https://github.com/gladia-research-group/multi-source-diffusion-models/blob/main/data/README.md, but in your msldm/data directory
make sure that msldm/data looks like:
data/
└───bass_22050
└─── train
└───Track00001.wav
...
└─── validation
└─── test
└───drums_22050
└───guitar_22050
└───piano_22050
└─── slakh2100/
└─── train/
└─── Track00001/
└─── bass.wav
└─── drums.wav
└─── guitar.wav
└─── piano.wav
...
...
cd SourceVAE/data
export PYTHONPATH=../../SourceVAE
python generate_dataset_slakh_latents.py --ckpt_path $ckpt_path --save_dir $save_dir --mode 'train' --device 'cuda:0' --batch_size 4 --n_workers 2
python generate_dataset_slakh_latents.py --ckpt_path $ckpt_path --save_dir $save_dir --mode 'validation' --device 'cuda:0' --batch_size 4 --n_workers 2specify the checkpoint and download path in ckpt_path and save_dir, you can download the sourcevae checkpoint from here: https://uofi.box.com/s/as0yxoua68f5dcathvs8yi34far7k705.
If you want to train your own SourceVAE follow the instructions below.
python generate_slakh_dataset_metadata.py --mode train
python generate_slakh_dataset_metadata.py --mode validationbash ../start.shThe logs and ckpts will be saved in SourceVAE/logfiles
⚠️ NOTE:
Before executing the training script, you have to change the WANDB_* environment variables to match your personal or institutional account.
cd msldmTo train MSLDM (this should take about 8 GB of GPU memory):
bash start_msldm.sh # train msldmTo train MSLDM-Large (this should take about 33 GB of GPU memory):
bash start_msldm_large.sh # train msldmthe inference example is shown in ./inference.ipynb, the ckpt should be downloaded from https://uofi.box.com/s/z2qxbdsxravhdg1n95khz8um3olgeya3 and then saved in ./ckpts, the ./ckpt should look like:
ckpt/
└───sourcevae_ckpt
└───msldm.ckpt
└───msldm_large.ckpt
If you use our model for your research, please consider citing
@misc{xu2024multisourcemusicgenerationlatent,
title={Multi-Source Music Generation with Latent Diffusion},
author={Zhongweiyang Xu and Debottam Dutta and Yu-Lin Wei and Romit Roy Choudhury},
year={2024},
eprint={2409.06190},
archivePrefix={arXiv},
primaryClass={eess.AS},
url={https://arxiv.org/abs/2409.06190},
}
