Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

readme.md

HowTo100M Data Release

We share additional features/data of the HowTo100M dataset for future research. What is HowTo100M?

WhisperX

It is commonly known that the ASR from YouTube has some noises, including synchronization issue (the timestamp does not associated with the speech perfectly) and translation issue (failure to recognize language and then transcribe in EN).

We use the time-wise accurate WhisperX package to process all the HowTo100M audio files, which gives word-level timestamps and highly accurate language recognition. WhisperX is build on OpenAI's Whisper with additional phoneme alignment module to ensure accurate timestamp of the ASR.

We used the whisper-large-v2 version, the best whisper version provided by OpenAI. For non-EN language, we provide ASR and word-level timestamps in the local langauge (if supported by whisperX), as well as English translation, but with sentence-level timestamps.

Visual Features

We provide recent/stronger visual features for HowTo100M. Following Miech et al., we provide features at 1 vector-per-second. For the original S3D features, please refer to Miech et al.

Currently we provide the following visual features:

Feature quality benchmarked on HTM-Align

Without any (learnable) joint visual-language model, we measure the backbone visual-langauge feature quality on HTM-Align -- which is similar to a retrieval setting.

Model Setting Recall
MILNCE global 0.287
MILNCE overlap-seq 0.342
CLIP ViT/B-32 global 0.175
CLIP ViT/B-32 overlap-seq 0.234
CLIP ViT/B-16 global 0.221
CLIP ViT/B-16 overlap-seq 0.278
CLIP ViT/L-14 global 0.256
CLIP ViT/L-14 overlap-seq 0.309
InternVideo-MM-L14 global 0.406
InternVideo-MM-L14 overlap-seq 0.437

Reference

If you find these data helpful, please consider citing us:

@InProceedings{han2022align,
  title={Temporal Alignment Network for long-term Video},  
  author={Tengda Han and Weidi Xie and Andrew Zisserman},  
  booktitle={CVPR},  
  year={2022}}