能给我解释一下为什么表现那么好吗

我们使用了vid2vid-zero,vid2vid-p2p以及controlvideo做了实验。
在我们的实验中controlvideo，效果远胜于前两者。
很长一段时间我以为controlvideo也是prompt2prompt的视频类延申工作，所以对于指定物品的修改才能如此精准，但是今天一看并没有类似于vid2vid-p2p的注意力替换，完全是似乎依赖于Keyframe Attention和Temporal At.（虽然p2p只有8帧

我真的有点想不明白为什么表现能那么好？仅仅依靠Keyframe Attention和Temporal At.

我是不是有理由可以怀疑 prompt2prompt中提到的方法（下图）其实是是一种次优的选择呢？那为什么又会是次优呢？
![image](https://github.com/user-attachments/assets/47b37bec-bd59-4a42-994d-d3c869c5880f)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

能给我解释一下为什么表现那么好吗 #16

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

能给我解释一下为什么表现那么好吗 #16

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions