Skip to content

Anil-matcha/wan-2.7-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation

Wan 2.7: The Next Generation Video Foundation Model

Wan 2.7 represents the cutting-edge evolution in Alibaba Cloud's video generation series, succeeding the professional-grade Wan 2.6.

Building upon the "Reference-to-Video" (R2V) and native audio-visual synchronization of its predecessor, Wan 2.7 pushes the boundaries of cinematic AI, long-form storytelling, and visual consistency.


Evolution: From Wan 2.6 to Wan 2.7

Previous Model: Wan 2.6 (Late 2025)

Wan 2.6 was a major milestone that moved beyond simple clips to professional-grade production:

  • Reference-to-Video (R2V): Enabled "clone-level" character consistency by uploading a reference video, maintaining appearance and motion patterns.
  • Smart Multi-Shot Storytelling: Automated the generation of multi-angle scenes (panoramic, close-ups, tracking shots) within a single 15-second 1080p clip.
  • Native AV Sync: First in the series to offer native lip-syncing and synchronized environmental sound effects (SFX) that match on-screen motion.
  • Dual-Subject Interaction: Supported high-fidelity interaction between up to two distinct characters.

Next Generation: Wan 2.7 (Early 2026 Roadmap)

Wan 2.7 refines the core architecture of 2.6, focusing on scaling, endurance, and granular control:

1. Extended Narrative Windows (30s - 60s)

While Wan 2.6 capped at 15 seconds, Wan 2.7 is engineered for extended durations, targeting 30 to 60 seconds of continuous, high-fidelity video in a single generation pass without losing temporal coherence.

2. Global Character Embedding & Persistence

Beyond 2.6's R2V, Wan 2.7 introduces Persistent Character VAEs, allowing the model to "memorize" character features across an entire project. This enables full-length AI filmmaking with a consistent cast across separate scenes.

3. Instruction-Based Video Editing

Wan 2.7 moves from simple generation to dynamic editing. It supports natural language instructions to modify existing footage (e.g., "Change the lighting to sunset" or "Add a futuristic helmet to the character"), preserving the original motion and structure while altering specific elements.

4. Advanced Physical Simulation (v4)

The updated Wan-VAE v4 significantly improves the simulation of complex physics, including realistic fluid dynamics (water, fire), cloth physics, and high-speed motion blur without the "warping" artifacts seen in earlier diffusion models.

5. Full Binaural Audio Synthesis

Succeeding the basic SFX of 2.6, Wan 2.7 generates Spatial Binaural Audio that dynamically shifts based on the camera's position and the movement of objects in the scene.


Comparison Summary

Feature Wan 2.6 Wan 2.7 (Target)
Max Duration 15 Seconds 30 - 60 Seconds
Resolution 1080p @ 24fps 1080p / 4K Upscaling
Consistency R2V (Reference-based) Global Persistent Embeddings
Audio Native Lip-Sync & SFX Full Binaural Spatial Audio
Editing Static Inpainting Dynamic Instruction-based Editing
License Apache 2.0 (Open Weights) Apache 2.0 (Open Weights)

Use Cases

  • AI-Driven Cinema: Creating consistent, long-form narrative content with a persistent cast.
  • Professional Marketing: Generating 4K-ready advertisements with perfect product and character consistency.
  • Social Media Storytelling: Rapid production of multi-shot, audio-synced content for TikTok and YouTube.
  • Immersive Education: Visualizing historical or scientific events with spatial audio and realistic physics.

Disclaimer: Wan 2.7 is the latest iteration of the Wan series by Alibaba Cloud. This repository serves as a knowledge hub and documentation for the model's capabilities and implementation.

Releases

No releases published

Packages

 
 
 

Contributors