-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Video processing can smooth the noise out of frame-by-frame predictions or use interpolation to avoid processing every frame (if the upstream service is too slow/expensive).
A video service could provide a bridge from image-based services to video.
For localizations, shapes, or segmentation, there are different ways this could be approached:
- Take a video and send key frames to a image-based service, interpolate between them using the motion vectors inherent in the video stream.
- Take a video and send every Nth frame to image-based service. Use optical-flow and template-based matching to interpolate between frames.
For something like face recognition, we'd want to assign a label to each face descriptor tracked through time. Since the face descriptor will have noise between frames, we will want to cluster identities and assign a unique (within the video) label to each. Basic graph clustering (dlib provides easy hooks to the chinese whispers algorithm) of identity vectors is one approach, but weighting their edges by the spatial/temporal distance of face descriptors should improve it.