-
Notifications
You must be signed in to change notification settings - Fork 131
Open
Description
If we fine tune ditto on single avatar image and then generate real time talking head for that avatar from audio , will the resulting model be lighter and will it give faster response ? If not , what are the best ways to achieve low latency with ditto for known /one avatar image scenario?
Does ditto skip it's identity resemblance /identity extraction step if we train it directly on a single avatar image that will also be used at inference time ?
Can we pre compute the identity features (and store apperance features in memory) and reuse them to make process faster ?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels