-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Hello, thank you for releasing the code and paper for CG-STVG — it's a very interesting work!
I have a question regarding the video frame sampling setup described in the paper versus the configuration in the code. In the paper, you mention:
"We set the video frame length Nv to 64 and the text sequence length Nt to 30."
However, in the hcstvg.yaml configuration file(for train and test in HC-STVG), I noticed that the video frames are sampled according to sample_fps, rather than using a fixed frame length of 64. Could you help clarify how this aligns with the experimental setup described in the paper?
I would appreciate any insight into the intended frame sampling strategy and how it relates to the reported results.
Thank you in advance for your time and support!