AttentionFlow: Text-to-Video Editing Using Motion Map Injection Module

Abstract

Text-to-image diffusion, which has been trained with a large amount of text-image pair dataset, shows remarkable performance in generating high-quality images. Recent research using diffusion model has been expanded for text-guided video editing tasks by using text-guided image diffusion models as baseline. Existing video editing studies have devised an implicit method of adding cross-frame attention to estimate frame-frame attention to attention maps, resulting in temporal consistent editing. However, because these methods use generative models trained on text-image pair data, they do not take into account one of the most important characteristics of video: motion. When editing a video with prompts, the attention map of the prompt implying the motion of the video, such as `running' or `moving', is not clearly estimated and accurate editing cannot be performed. In this paper, we propose the `Motion Map Injection' (MMI) module to perform accurate video editing by considering movement information explicitly. The MMI module provides a simple but effective way to convey video motion information to T2V models by performing three steps: 1) extracting motion map, 2) calculating the similarity between the motion map and the attention map of each prompt, and 3) injecting motion map into the attention maps. Considering experimental results, input video can be edited accurately and effectively with MMI module. To the best of our knowledge, our study is the first method that utilizes the motion in video for text-to-video editing.

Setup

pip install -r requirements.txt

The environment is very similar to Video-P2P.

Weights

We use the pre-trained stable diffusion model. You can download it here.

Quickstart

Since we developed our codes based on Video-P2P codes, you could refer to their github, if you need.

Please replace pretrained_model_path with the path to your stable-diffusion.

To download the pre-trained model, please refer to diffusers.

# Stage 1: Tuning to do model initialization.

# You can minimize the tuning epochs to speed up.
python run_tuning.py  --config="configs/cloud-1-tune.yaml"

# Stage 2: Attention Control

python run_attention_flow.py --config="configs/cloud-1-p2p.yaml"

Find your results in Video-P2P/outputs/xxx/results.

Examples

Input Video	Video-P2P	Ours
"clouds flowing under a skyscraper"	"waves flowing under a skyscraper"	"waves flowing under a skyscraper"

"clouds flowing on the mountain"	"lava flowing on the mountain"	"lava flowing on the mountain"

"spinning wings of windmill are beside the river"	"yellow spinning wings of windmill are beside the river"	"yellow spinning wings of windmill are beside the river"

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
configs		configs
data		data
demo		demo
gradio_utils		gradio_utils
results		results
tmp		tmp
tuneavideo		tuneavideo
unimatch		unimatch
README.md		README.md
app_gradio.py		app_gradio.py
motion_pre_process.py		motion_pre_process.py
ptp_utils.py		ptp_utils.py
ptp_utils_original.py		ptp_utils_original.py
requirements.txt		requirements.txt
run_attention_flow.py		run_attention_flow.py
run_tuning.py		run_tuning.py
seq_aligner.py		seq_aligner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AttentionFlow: Text-to-Video Editing Using Motion Map Injection Module

Abstract

Setup

Weights

Quickstart

Examples

About

Uh oh!

Releases

Packages

Uh oh!

Languages

currycurry915/currycurry915_old.github.io

Folders and files

Latest commit

History

Repository files navigation

AttentionFlow: Text-to-Video Editing Using Motion Map Injection Module

Abstract

Setup

Weights

Quickstart

Examples

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages