Skip to content

[feature] WebAI - Add support for SmolVLM 256M (& 500M) #82

@addyosmani

Description

@addyosmani

SmolVLM 256M (& 500M): The world’s smallest multimodal model. Designed for efficiency and perfect for on-device applications

Demo: https://huggingface.co/spaces/HuggingFaceTB/SmolVLM-256M-Instruct-WebGPU
Twitter: https://x.com/xenovacom/status/1882435994160447587
Examples: https://github.com/huggingface/transformers.js-examples/tree/main/smolvlm-webgpu

It would be great to extend our models with SmolVLM.

Image

There are some UX improvements we could also make at the same time such as, when you use a prompt with an image:

Fixing the [Object] text following a selected image.

Image

Double-checking there are no specific issues with multi-modal support. I was running into issues with our other multimodal model (couldn't get it working), unsure if this was model specific.

Image Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions