VJEPA2

Self-supervised visual representation learning from video. Part of the Zen LM ecosystem.

Overview

VJEPA2 implements Video Joint-Embedding Predictive Architecture for learning visual representations from unlabeled video data without relying on hand-crafted augmentations.

Features

Self-supervised learning from video
No hand-crafted augmentations required
Pre-trained visual encoder for downstream tasks
Efficient training with masking strategies

jin — Multimodal understanding framework
Zen LM — Full model family

License

See LICENSE file.

Part of the Zen LM ecosystem by Hanzo AI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VJEPA2

Overview

Features

Related

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

VJEPA2

Overview

Features

Related

License