Skip to content

w1018979952/ACIENet

Repository files navigation

requirments

python 3.8 librosa 0.7.2 numpy 1.19.0 torch 1.4.0 torchvision 0.5.0

Get Dataset and Paper

VoxCeleb1

  • wav audio data, 1,251 people in total, 39 GB after decompression.

  • Baidu Cloud link: VoxCeleb1

  • VoxCeleb Document Classification:VoxCeleb1 Document

  • Decompression command:

  • zip -s 0 split.zip --out unsplit.zip

  • unzip unslit.zip

  • Vox1 official website: VoxCeleb1

VoxCeleb2

  • MP4 video data, files include audio, total of 5,994 people, 255 GB after decompression.

  • Baidu Cloud link: VoxCeleb2

  • Decompression command:

  • zip -s 0 vox2_mp4_dev.zip --out unsplit.zip

  • unzip unslit.zip

  • Vox2 official website: VoxCeleb2

  • VoxCeleb2 test dataset: Test 118ID 

Contact

If you have any questions, please feel free to contact with me at Netizenwjx@foxmail.com.

Citation

@Article{wang2024attribute,
title={Attribute-guided cross-modal interaction and enhancement for audio-visual matching},
author={Wang, Jiaxiang and Zheng, Aihua and Yan, Yan and He, Ran and Tang, Jin},
journal={IEEE Transactions on Information Forensics and Security},
volume={19},
number={},
pages={4986-4998},
note = {doi: 10.1109/TIFS.2024.3388949},
year={2024}
}