-
Notifications
You must be signed in to change notification settings - Fork 58
Open
Description
The similarity between embeddings of text, video, audio, etc are not high, usually around 0.1 - 0.3, how do we know how relevant the embeddings are to each other? Can this encoder be trusted for downstream tasks such as semantic search in video? If so, what is the appropriate way to use these embeddings?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels