Public-Private Attributes-Based Variational Adversarial Network for Audio-Visual Cross-Modal Matching
python 3.8 librosa 0.7.2 numpy 1.19.0 torch 1.4.0 torchvision 0.5.0
- Download the VoxCeleb, VGGFace
- Latest Paper List Audio-visual matching
-
wav audio data, 1,251 people in total, 39 GB after decompression.
-
Baidu Cloud link: VoxCeleb1
-
Decompression command:
-
zip -s 0 split.zip --out unsplit.zip
-
unzip unslit.zip
-
Vox1 official website: VoxCeleb1
-
MP4 video data, files include audio, total of 5,994 people, 255 GB after decompression.
-
Baidu Cloud link: VoxCeleb2
-
Decompression command:
-
zip -s 0 vox2_mp4_dev.zip --out unsplit.zip
-
unzip unslit.zip
-
Vox2 official website: VoxCeleb2
If you have any questions, please feel free to contact with me at Netizenwjx@foxmail.com.
@Article{Zheng2024Public,
author={Zheng, Aihua and Yuan, Fan and Zhang, Haichuan and Wang, Jiaxiang and Tang, Chao and Li, Chenglong},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
title={Public-Private Attributes-Based Variational Adversarial Network for Audio-Visual Cross-Modal Matching},
volume={34},
number={9},
pages={8698-8709},
note = {doi: 10.1109/TCSVT.2024.3390573},
year={2024}}