Skip to content

Implement video to image proxy service #6

@ferrouswheel

Description

@ferrouswheel

Video processing can smooth the noise out of frame-by-frame predictions or use interpolation to avoid processing every frame (if the upstream service is too slow/expensive).

A video service could provide a bridge from image-based services to video.

For localizations, shapes, or segmentation, there are different ways this could be approached:

  • Take a video and send key frames to a image-based service, interpolate between them using the motion vectors inherent in the video stream.
  • Take a video and send every Nth frame to image-based service. Use optical-flow and template-based matching to interpolate between frames.

For something like face recognition, we'd want to assign a label to each face descriptor tracked through time. Since the face descriptor will have noise between frames, we will want to cluster identities and assign a unique (within the video) label to each. Basic graph clustering (dlib provides easy hooks to the chinese whispers algorithm) of identity vectors is one approach, but weighting their edges by the spatial/temporal distance of face descriptors should improve it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions