Looking and Hearing into Details: Dual-enhanced Siamese Adversarial Network for Audio-Visual Matching
python 3.8
librosa 0.7.2
numpy 1.19.0
torch 1.4.0
torchvision 0.5.0
- Download the VoxCeleb, VGGFace
- Latest Paper List Audio-visual matching
-
wav audio data, 1,251 people in total, 39 GB after decompression.
-
Baidu Cloud link: VoxCeleb1
-
VoxCeleb Document Classification:VoxCeleb1 Document
-
Decompression command:
-
zip -s 0 split.zip --out unsplit.zip
-
unzip unslit.zip
-
Vox1 official website: VoxCeleb1
-
MP4 video data, files include audio, total of 5,994 people, 255 GB after decompression.
-
Baidu Cloud link: VoxCeleb2
-
Decompression command:
-
zip -s 0 vox2_mp4_dev.zip --out unsplit.zip
-
unzip unslit.zip
-
Vox2 official website: VoxCeleb2
If you have any questions, please feel free to contact with me at Netizenwjx@foxmail.com.
@article{wang2022looking,
title={Looking and Hearing into Details: Dual-enhanced Siamese Adversarial Network for Audio-Visual Matching},
author={Wang, Jiaxiang and Li, Chenglong and Zheng, Aihua and Tang, Jin and Luo, Bin},
journal={IEEE Transactions on Multimedia},
volume={25},
pages = {7505-7516},
year={2023}
}