This repository contains the scripts necessary to generate the LibriVAD dataset, a large-scale, noise-augmented dataset for Voice Activity Detection (VAD) based on the LibriSpeech corpus.
-
Install Python Libraries: Navigate to the
LibriVADdirectory and install the required dependencies.pip install -r requirements.txt
-
Download Source Data: Run the
setup.pyscript and choose the LibriSpeech mirror closest to your location from[EU, USA, CN]. This will download the necessary splits of the LibriSpeech dataset, as well as theForced_alignmentsandNoisesdata from Hugging Face.EU Example:
python setup.py EU
After downloading
train-clean-100,dev-clean, andtest-clean, the local setup should take a few minutes.
-
Generate the Dataset: After the setup is complete, run the
create_LibriVAD.pyscript. Choose the size of the generated dataset:"small","medium", or"large".The approximate sizes of the final dataset are 15GB, 150GB, and 1.5TB respectively.
Small Dataset Example:
python create_LibriVAD.py small
The generation process can take several hours, depending on the chosen size and your machine's performance.
Warning: The initial space requirement for the setup process (before generating the final dataset) is approximately 61GB. Please ensure you have sufficient disk space.
The noise used for the generation of LibriVAD can be downloaded from https://huggingface.co/datasets/LibriVAD/LibriVAD/resolve/main/Files/Noises.zip
The trained models can also be found in the Hugging Face page: https://huggingface.co/datasets/LibriVAD/LibriVAD/tree/main/Eval-ViT-MFCC
This project is licensed under the MIT License.
I. Stylianou, A.K. Sarkar, N. Dawalatabad, J. Glass, and Z.-H. Tan, "LibriVAD: A Scalable Open Dataset with Deep Learning Benchmarks for Voice Activity Detection," arXiv preprint arXiv:2512.17281 (2025). url