Thanks for your nice work!
I found that the extracted video are already split, while in the jsonl data, we should input the raw video path, if you will release them on hugging face?
For example, {"type": "video", "video": "video/youtube/7LEb_PFedDk.mp4", "video_start": 59.53, "video_end": 191.14}, while there is only 7LEb_PFedDk_59.53-191.14_2.0fps.mp4.
Thanks again!