Fast vision-language model architecture research. Part of the Zen LM ecosystem.
FastVLM explores efficient architectures for vision-language models, focusing on reducing computational overhead while maintaining strong multimodal understanding.
- Efficient vision-language model architecture
- Reduced computational overhead vs standard VLMs
- Strong multimodal understanding
- Research reference implementation
- zen-vl — Zen vision-language models
- jin — Multimodal understanding framework
- Zen LM — Full model family
See LICENSE file.