Skip to content

Latest commit

 

History

History
60 lines (42 loc) · 2.49 KB

File metadata and controls

60 lines (42 loc) · 2.49 KB

LLIA - Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models

Haojie Yu* · Zhaonian Wang* · Yihan Pan* · Meng Cheng · Hao Yang · Chao Wang · Tao Xie · Xiaoming Xu · Xiaoming Wei · Xunliang Cai

*Equal Contribution Corresponding Authors

TL; DR: LLIA is a real-time audio-driven portrait video generation with diffusion models, enabling low-latency interactive avatars.

001.mp4

Video Demos

001.mp4
002.mp4
003.mp4

🔆 Introduction

We propose LLIA , a novel audio-driven portrait video generation framework based on the diffusion model. Our approach achieves low-latency, fluid, and authentic two-way communication. On an NVIDIA RTX 4090D, our model achieves a maximum of 78 FPS at a resolution of 384 × 384 and 45 FPS at a resolution of 512 × 512, with an initial video generation latency of 140 ms and 215 ms, respectively

🔥 Latest News

📑 Todo List

  • Release the technical report
  • Inference
  • Checkpoints