Skip to content

training problem! #71

@mingtiannihao

Description

@mingtiannihao

Hi, my dataset contains videos with mixed frame rates (23, 24, and 30 fps). Since our training pipeline does not explicitly model FPS, I’m concerned that extracting a fixed number of frames per sample (e.g., 25/121/257 frames) will correspond to different real durations across videos, which may introduce inconsistent motion speed/tempo during training.

What is the best way to handle mixed-FPS data in process_videos.py?

Specifically, should we:

  1. Normalize all videos to a single target FPS (e.g., 25 fps) before computing latents; or
  2. Modify process_videos.py to resample frames to a target FPS inside _preprocess_video() (time-based sampling); or
  3. Include FPS in the bucket selection logic (FPS-aware buckets), so that videos are grouped by FPS and processed with consistent temporal settings?

If you recommend option (2) or (3), could you point out the minimal code changes needed (e.g., where to incorporate fps when selecting buckets or sampling frames), and whether there are any implications for audio alignment when --with-audio is enabled?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions