-
Notifications
You must be signed in to change notification settings - Fork 288
Open
Description
Hi, my dataset contains videos with mixed frame rates (23, 24, and 30 fps). Since our training pipeline does not explicitly model FPS, I’m concerned that extracting a fixed number of frames per sample (e.g., 25/121/257 frames) will correspond to different real durations across videos, which may introduce inconsistent motion speed/tempo during training.
What is the best way to handle mixed-FPS data in process_videos.py?
Specifically, should we:
- Normalize all videos to a single target FPS (e.g., 25 fps) before computing latents; or
- Modify
process_videos.pyto resample frames to a target FPS inside_preprocess_video()(time-based sampling); or - Include FPS in the bucket selection logic (FPS-aware buckets), so that videos are grouped by FPS and processed with consistent temporal settings?
If you recommend option (2) or (3), could you point out the minimal code changes needed (e.g., where to incorporate fps when selecting buckets or sampling frames), and whether there are any implications for audio alignment when --with-audio is enabled?
Thanks!
Metadata
Metadata
Assignees
Labels
No labels