Skip to content

Latest commit

 

History

History
33 lines (24 loc) · 2.61 KB

File metadata and controls

33 lines (24 loc) · 2.61 KB

LongCat-Video-Avatar

LongCat-Video-Avatar

🔥 Latest News

  • Dec 16, 2025: 🚀 We are excited to announce the release of LongCat-Video-Avatar, a unified model that delivers expressive and highly dynamic audio-driven character animation, supporting native tasks including Audio-Text-to-Video, Audio-Text-Image-to-Video, and Video Continuation with seamless compatibility for both single-stream and multi-stream audio inputs. The release includes our Technical Report, code, model weights, and project page.

✨ Key Features:

  • Support Multiple Generation Modes: one unified model can be used for audio-text-to-video (AT2V) generation, audio-text-image-to-video (ATI2V) generation, and Video Continuation.
  • Natural Human Dynamics: The disentangled unconditional guidance is designed to effectively decouple speech signals from motion dynamics for natural behavior.
  • Avoid Repetitive Content: The reference skip attention is adopted to​ strategically incorporates reference cues to preserve identity while preventing excessive conditional image leakage
  • Alleviate Error Accumulation from VAE: Cross-Chunk Latent Stitching is designed to eliminates redundant VAE decode-encode cycles to reduce pixel degradation in long sequences