[feature] WebAI - Add support for SmolVLM 256M (& 500M)

> SmolVLM 256M (& 500M): The world’s smallest multimodal model. Designed for efficiency and perfect for on-device applications

Demo: https://huggingface.co/spaces/HuggingFaceTB/SmolVLM-256M-Instruct-WebGPU
Twitter: https://x.com/xenovacom/status/1882435994160447587
Examples: https://github.com/huggingface/transformers.js-examples/tree/main/smolvlm-webgpu

It would be great to extend our models with SmolVLM.

<img width="449" alt="Image" src="https://github.com/user-attachments/assets/f84c27a7-90c8-4181-814f-793d6b702a25" />

There are some UX improvements we could also make at the same time such as, when you use a prompt with an image:

Fixing the [Object] text following a selected image.

<img width="416" alt="Image" src="https://github.com/user-attachments/assets/f33cd230-4786-4a0b-92d7-d1e12b6e9b30" />

Double-checking there are no specific issues with multi-modal support. I was running into issues with our other multimodal model (couldn't get it working), unsure if this was model specific.

<img width="303" alt="Image" src="https://github.com/user-attachments/assets/780aa2c5-7698-4542-aa50-80763cb0c0bc" />

<img width="405" alt="Image" src="https://github.com/user-attachments/assets/311621b2-e06e-4589-8f8a-9b17cb3a4d4f" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] WebAI - Add support for SmolVLM 256M (& 500M) #82

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[feature] WebAI - Add support for SmolVLM 256M (& 500M) #82

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions