Skip to content

w1018979952/DSANet

Repository files navigation

DSANet

Looking and Hearing into Details: Dual-enhanced Siamese Adversarial Network for Audio-Visual Matching

requirments

python 3.8
librosa 0.7.2
numpy 1.19.0
torch 1.4.0
torchvision 0.5.0

Get Dataset and Paper

VoxCeleb1

  • wav audio data, 1,251 people in total, 39 GB after decompression.

  • Baidu Cloud link: VoxCeleb1

  • VoxCeleb Document Classification:VoxCeleb1 Document

  • Decompression command:

  • zip -s 0 split.zip --out unsplit.zip

  • unzip unslit.zip

  • Vox1 official website: VoxCeleb1

VoxCeleb2

  • MP4 video data, files include audio, total of 5,994 people, 255 GB after decompression.

  • Baidu Cloud link: VoxCeleb2

  • Decompression command:

  • zip -s 0 vox2_mp4_dev.zip --out unsplit.zip

  • unzip unslit.zip

  • Vox2 official website: VoxCeleb2

Contact

If you have any questions, please feel free to contact with me at Netizenwjx@foxmail.com.

Citation

@article{wang2022looking,
  title={Looking and Hearing into Details: Dual-enhanced Siamese Adversarial Network for Audio-Visual Matching},
  author={Wang, Jiaxiang and Li, Chenglong and Zheng, Aihua and Tang, Jin and Luo, Bin},
  journal={IEEE Transactions on Multimedia},
  volume={25},
  pages = {7505-7516},
  year={2023}
}

Releases

No releases published

Packages

 
 
 

Contributors

Languages