Multi-Source Music Generation with Latent Diffusion

This is the inference models of our paper MSLDM: Multi-Source Latent Diffusion for Music Generation: https://arxiv.org/abs/2409.06190. Our demo site is shown here: https://xzwy.github.io/MSLDMDemo/.

General Pipeline

To start this project,

git clone https://github.com/XZWY/MSLDM

Enviroment

The environment for running our code can be installed using conda:

# Install environment
conda env create -f env.yaml

# Activate the environment
conda activate msldm

slakh2100 dataset

1. download the complete dataset from https://github.com/gladia-research-group/multi-source-diffusion-models/blob/main/data/README.md

2, Move all the downloaded tar files inside msldm/data

3. Follow the data preparation instruction 3,4 in https://github.com/gladia-research-group/multi-source-diffusion-models/blob/main/data/README.md, but in your msldm/data directory

make sure that msldm/data looks like:

data/
 └───bass_22050
        └─── train
            └───Track00001.wav
            ...
        └─── validation
        └─── test
 └───drums_22050
 └───guitar_22050
 └───piano_22050
 └─── slakh2100/
       └─── train/
             └─── Track00001/
                   └─── bass.wav
                   └─── drums.wav
                   └─── guitar.wav
                   └─── piano.wav
            ...
      ...

SourceVAE

Inference to get the latent dataset (needed for msldm training)

cd SourceVAE/data
export PYTHONPATH=../../SourceVAE
python generate_dataset_slakh_latents.py --ckpt_path $ckpt_path --save_dir $save_dir --mode 'train' --device 'cuda:0' --batch_size 4 --n_workers 2
python generate_dataset_slakh_latents.py --ckpt_path $ckpt_path --save_dir $save_dir --mode 'validation' --device 'cuda:0' --batch_size 4 --n_workers 2

specify the checkpoint and download path in ckpt_path and save_dir, you can download the sourcevae checkpoint from here: https://uofi.box.com/s/as0yxoua68f5dcathvs8yi34far7k705.

Train SourceVAE

If you want to train your own SourceVAE follow the instructions below.

generate dataset metadata

python generate_slakh_dataset_metadata.py --mode train
python generate_slakh_dataset_metadata.py --mode validation

train SourceVAE

bash ../start.sh

The logs and ckpts will be saved in SourceVAE/logfiles

MSLDM

training

⚠️ NOTE:
Before executing the training script, you have to change the WANDB_* environment variables to match your personal or institutional account.

cd msldm

To train MSLDM (this should take about 8 GB of GPU memory):

bash start_msldm.sh # train msldm

To train MSLDM-Large (this should take about 33 GB of GPU memory):

bash start_msldm_large.sh # train msldm

inference

the inference example is shown in ./inference.ipynb, the ckpt should be downloaded from https://uofi.box.com/s/z2qxbdsxravhdg1n95khz8um3olgeya3 and then saved in ./ckpts, the ./ckpt should look like:

ckpt/
 └───sourcevae_ckpt
 └───msldm.ckpt
 └───msldm_large.ckpt

Citations

If you use our model for your research, please consider citing

@misc{xu2024multisourcemusicgenerationlatent,
      title={Multi-Source Music Generation with Latent Diffusion}, 
      author={Zhongweiyang Xu and Debottam Dutta and Yu-Lin Wei and Romit Roy Choudhury},
      year={2024},
      eprint={2409.06190},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2409.06190}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
SourceVAE		SourceVAE
msldm		msldm
.gitignore		.gitignore
LICENSE		LICENSE
general_pipeline.png		general_pipeline.png
inference.ipynb		inference.ipynb
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi-Source Music Generation with Latent Diffusion

General Pipeline

To start this project,

Enviroment

slakh2100 dataset

1. download the complete dataset from https://github.com/gladia-research-group/multi-source-diffusion-models/blob/main/data/README.md

2, Move all the downloaded tar files inside msldm/data

3. Follow the data preparation instruction 3,4 in https://github.com/gladia-research-group/multi-source-diffusion-models/blob/main/data/README.md, but in your msldm/data directory

SourceVAE

Inference to get the latent dataset (needed for msldm training)

Train SourceVAE

generate dataset metadata

train SourceVAE

MSLDM

training

inference

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

XZWY/MSLDM

Folders and files

Latest commit

History

Repository files navigation

Multi-Source Music Generation with Latent Diffusion

General Pipeline

To start this project,

Enviroment

slakh2100 dataset

1. download the complete dataset from https://github.com/gladia-research-group/multi-source-diffusion-models/blob/main/data/README.md

2, Move all the downloaded tar files inside msldm/data

3. Follow the data preparation instruction 3,4 in https://github.com/gladia-research-group/multi-source-diffusion-models/blob/main/data/README.md, but in your msldm/data directory

SourceVAE

Inference to get the latent dataset (needed for msldm training)

Train SourceVAE

generate dataset metadata

train SourceVAE

MSLDM

training

inference

Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages