-
Notifications
You must be signed in to change notification settings - Fork 97
Description
SmolVLM 256M (& 500M): The world’s smallest multimodal model. Designed for efficiency and perfect for on-device applications
Demo: https://huggingface.co/spaces/HuggingFaceTB/SmolVLM-256M-Instruct-WebGPU
Twitter: https://x.com/xenovacom/status/1882435994160447587
Examples: https://github.com/huggingface/transformers.js-examples/tree/main/smolvlm-webgpu
It would be great to extend our models with SmolVLM.
There are some UX improvements we could also make at the same time such as, when you use a prompt with an image:
Fixing the [Object] text following a selected image.
Double-checking there are no specific issues with multi-modal support. I was running into issues with our other multimodal model (couldn't get it working), unsure if this was model specific.
