python 3.8 librosa 0.7.2 numpy 1.19.0 torch 1.4.0 torchvision 0.5.0
- Download the VoxCeleb, VGGFace
- Latest Paper List Audio-visual matching
- Code:samplers.zip samplers
-
wav audio data, 1,251 people in total, 39 GB after decompression.
-
Baidu Cloud link: VoxCeleb1
-
VoxCeleb Document Classification:VoxCeleb1 Document
-
Decompression command:
-
zip -s 0 split.zip --out unsplit.zip
-
unzip unslit.zip
-
Vox1 official website: VoxCeleb1
-
MP4 video data, files include audio, total of 5,994 people, 255 GB after decompression.
-
Baidu Cloud link: VoxCeleb2
-
Decompression command:
-
zip -s 0 vox2_mp4_dev.zip --out unsplit.zip
-
unzip unslit.zip
-
Vox2 official website: VoxCeleb2
-
VoxCeleb2 test dataset: Test 118ID
If you have any questions, please feel free to contact with me at Netizenwjx@foxmail.com.
@Article{wang2024attribute,
title={Attribute-guided cross-modal interaction and enhancement for audio-visual matching},
author={Wang, Jiaxiang and Zheng, Aihua and Yan, Yan and He, Ran and Tang, Jin},
journal={IEEE Transactions on Information Forensics and Security},
volume={19},
number={},
pages={4986-4998},
note = {doi: 10.1109/TIFS.2024.3388949},
year={2024}
}