training problem！

Hi, my dataset contains videos with mixed frame rates (23, 24, and 30 fps). Since our training pipeline does not explicitly model FPS, I’m concerned that extracting a fixed number of frames per sample (e.g., 25/121/257 frames) will correspond to different real durations across videos, which may introduce inconsistent motion speed/tempo during training.

What is the best way to handle mixed-FPS data in `process_videos.py`?

Specifically, should we:
1) **Normalize all videos to a single target FPS** (e.g., 25 fps) before computing latents; or  
2) **Modify `process_videos.py` to resample frames to a target FPS** inside `_preprocess_video()` (time-based sampling); or  
3) **Include FPS in the bucket selection logic** (FPS-aware buckets), so that videos are grouped by FPS and processed with consistent temporal settings?

If you recommend option (2) or (3), could you point out the minimal code changes needed (e.g., where to incorporate `fps` when selecting buckets or sampling frames), and whether there are any implications for audio alignment when `--with-audio` is enabled?

Thanks!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

training problem！ #71

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

training problem！ #71

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions