-
Notifications
You must be signed in to change notification settings - Fork 32
Open
Description
The output shape of forward_sliding varies depending on the number of input frames T.
- When T > 2, it performs normal tracking and returns a flow of shape (B, T, 2, H, W) for each frame.
- When T == 2, it performs optical flow inference and returns a single flow map of shape (B, 2, H, W).
I assume this design is intended to optimize pair-wise optical flow inference. However, the behavior is confusing and inconsistent from a video-tracking perspective. It requires handling a special case when the sequence contains only two frames.
It would be clearer to separate the two purposes into distinct functions, such as forward_sliding and forward_pair. The forward_sliding function should consistently return the same number of frames as the input sequence.
Metadata
Metadata
Assignees
Labels
No labels