Extracting several kinds of visual representations from videos.
The following frame-level (*_features) and video-level (*_gloabal) visual representations are supported:
- 2D-CNN (
cnn_features,cnn_globals,cnn_sem_globals) - 3D-CNN (
c3d_features,c3d_globals,i3d_features,i3d_globals) - ECO (
eco_features,eco_globals,eco_sem_features,eco_sem_globals) - TSM (
tsm_features,tsm_globals,tsm_sem_features,tsm_sem_globals)
Note: *_sem_* representations are based on the classification level (probability distribution) of respective models.
This package has been tested for extracting visual representations from videos of the following video-caption datasets:
- MSVD
- M-VAD
- MSR-VTT
- TRECVID-2020
- TRECVID-2020-Test
- TGIF
- VATEX
- ActivityNet
- ActivityNet-Test
- ActivityNet-Fragments