-
Notifications
You must be signed in to change notification settings - Fork 160
Open
Description
🚀 The feature, motivation and pitch
🤗 Hello! Thank you for your work!
I see model configurations which working with certain modalities in this repo and it is great.
I have a question though, what if I have pretrained encoder for other modality (e.g. for audio) and a data for training (audio-text pairs and audio-image pairs).
- How can I train a model which will be able to solve tasks with my new modality?
- In other words, which components I should use to fuse new modality with other ones? Should I implement a new model or I can use existed components as fusers?
Alternatives
No response
Additional context
It will be great if the user that have N pretrained encoders for arbitrary modalities will be able to pass them to some fuser model and train it to solve cross modal tasks. Or add the new modality to existing model.
Metadata
Metadata
Assignees
Labels
No labels