VDOT: Efficient Unified Video Creation via Optimal Transport Distillation

Yutong Wang¹, Haiyu Zhang^3,2, Tianfan Xue^4,2, Yu Qiao², Yaohui Wang², Chang Xu^1*, Xinyuan Chen^2*

¹USYD, ²Shanghai AI Laboratory, ³BUAA, ⁴CUHK

Introduction

VDOT is an efficient, unified video creation model that achieves high-quality results in just 4 denoising steps. By employing Computational Optimal Transport (OT) within the distillation process, VDOT ensures training stability and enhances both training and inference efficiency. VDOT unifies a wide range of capabilities, such as Reference-to-Video (R2V), Video-to-Video (V2V), Masked Video Editing (MV2V), and arbitrary composite tasks, matching the versatility of VACE with significantly reduced inference costs.

sour_cover_compressed_.mp4

⚙️ Installation

The codebase was tested with Python 3.10.13, CUDA version 12.4, and PyTorch >= 2.5.1.

🚀 Usage

Acknowledgement

We are grateful for the following awesome projects, including VACE, Wan, and Self-Forcing.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
assets		assets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VDOT: Efficient Unified Video Creation via Optimal Transport Distillation

Introduction

⚙️ Installation

🚀 Usage

Acknowledgement

BibTeX

About

Uh oh!

Releases

Packages

hhhh1138/VDOT

Folders and files

Latest commit

History

Repository files navigation

VDOT: Efficient Unified Video Creation via Optimal Transport Distillation

Introduction

⚙️ Installation

🚀 Usage

Acknowledgement

BibTeX

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages