Skip to content

Incremental addition of the new modality #390

@averkij

Description

@averkij

🚀 The feature, motivation and pitch

🤗 Hello! Thank you for your work!

I see model configurations which working with certain modalities in this repo and it is great.

I have a question though, what if I have pretrained encoder for other modality (e.g. for audio) and a data for training (audio-text pairs and audio-image pairs).

  • How can I train a model which will be able to solve tasks with my new modality?
  • In other words, which components I should use to fuse new modality with other ones? Should I implement a new model or I can use existed components as fusers?

Alternatives

No response

Additional context

It will be great if the user that have N pretrained encoders for arbitrary modalities will be able to pass them to some fuser model and train it to solve cross modal tasks. Or add the new modality to existing model.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions