Hi, thanks for your word! I have the following questions:
- Do these coordinates correspond to first 32 second at 1fps?
- Are the bounding boxes in absolute coordinates, and Does the length greater than 1000 need to be normalized to 1000?
- Is there a situation where, due to the overall token limit, the model has scaled the video but the bounding boxes have not been scaled? How is this situation generally handled?