AV-SSRL

MSc Thesis: "Audio-Visual Self-Supervised Representation Learning in-the-wild"

Pre-trained models

We provide checkpoints for models pre-trained on a subset of VGGSound with 50,000 videos. The former method refers to Cross-modal Instance Discrimination (xID), whereas the latter is based on the recently proposed VICReg method.

Method	Checkpoint (100 epochs)
xID	download link
VICReg	download link

Self-supervised pre-training

To train a model using xID method run the following (assuming that DDP strategy is used):

python3 main-ssl.py configs/VGGSound-N1024.yaml --multiprocessing-distributed

For VICReg method, run:

python3 main-vicreg.py configs/VGGSound-VICReg.yaml --multiprocessing-distributed

To avoid data parallelism, discard --multiprocessing-distributed argument and set the --gpu argument on either of the aforementioned scripts to a specific id (e.g. 0 for the first GPU device).

Linear classification

For this experiment, run the following (e.g. for UCF-101 dataset and model pre-trained using xID method):

python3 eval-action-recg-linear.py configs/ucf/8at16-linear.yaml configs/VGGSound-N1024.yaml --distributed

Note that this script does not yet support multi-node evaluation.

Final results on both UCF-101 and HMDB-51 datasets are shown in the following table:

Method	Top-1 Acc. (UCF-101)	Top-5 Acc. (UCF-101)	Top-1 Acc. (HMDB-51)	Top-5 Acc. (HMDB-51)
xID	51.20%	80.91%	28.08%	61.29%
VICReg	39.75%	71.30%	21.85%	52.69%

Fine-tuning

For this experiment, run the following (e.g. for HMDB-51 dataset and model pre-trained using VICReg method):

python3 eval-action-recg.py configs/hmdb51/8at16-fold1.yaml configs/VGGSound-VICReg.yaml --distributed

Note that this script does not yet support multi-node evaluation.

Final results on both UCF-101 and HMDB-51 datasets are shown in the following table:

Method	Top-1 Acc. (UCF-101)	Top-5 Acc. (UCF-101)	Top-1 Acc. (HMDB-51)	Top-5 Acc. (HMDB-51)
xID	73.22%	92.78%	42.85%	73.69%
VICReg	59.53%	85.94%	34.65%	68.96%

Concept Generalization

In this experiment, we test the generalization performance of self-supervised models on data belonging to unknown classes (i.e. classes not found in the pre-training dataset). To perform the split on the so-called seen and unseen concepts, please use the label_similarities.ipynb notebook. Based on our results, you can find the set of unseen concepts for UCF-101 and HMDB-51 respectively in datasets/rest_classes/ directory.

To perform this experiment, run the following (e.g. for xID model and UCF-101 dataset with 20% of training data per class for tuning the linear classifier):

python3 eval-action-recg-linear.py configs/ucf/8at16-linear.yaml configs/VGGSound-N1024.yaml --distributed --few-shot-ratio 0.2 --use-rest-classes

Final results are depicted in the following plots:

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
configs		configs
criterions		criterions
datasets		datasets
models		models
results		results
utils		utils
LICENSE		LICENSE
README.md		README.md
check_with_ffprobe.sh		check_with_ffprobe.sh
clear_ytdl_cache.sh		clear_ytdl_cache.sh
download_hmdb51.sh		download_hmdb51.sh
download_ucf101.sh		download_ucf101.sh
download_vggsound.py		download_vggsound.py
download_video.sh		download_video.sh
eval-action-recg-linear.py		eval-action-recg-linear.py
eval-action-recg.py		eval-action-recg.py
ffprobe_audio_stream_exists.sh		ffprobe_audio_stream_exists.sh
label_similarities.ipynb		label_similarities.ipynb
main-ssl.py		main-ssl.py
main-supervised.py		main-supervised.py
main-vicreg.py		main-vicreg.py
retrieval.py		retrieval.py
vggsound.csv		vggsound.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AV-SSRL

Pre-trained models

Self-supervised pre-training

Linear classification

Fine-tuning

Concept Generalization

References

About

Uh oh!

Releases

Packages

Languages

License

kvilouras/AV-SSRL

Folders and files

Latest commit

History

Repository files navigation

AV-SSRL

Pre-trained models

Self-supervised pre-training

Linear classification

Fine-tuning

Concept Generalization

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages