Ruibin Li1,2, Tao Yang1, Yangming Shi1, Weiguo Feng1, Shilei Wen1, Bingyue Peng1, Lei Zhang2
1ByteDance, 2The Hong Kong Polytechnic University,
- Inference code and model weights has been released, have fun with MfM ⭐⭐.
- ✅ Inference Code
- ✅ Model Weights
- ⬜️ Optimization for Parallel Inference
pip install -r requirements.txt
from huggingface_hub import snapshot_download
snapshot_download(repo_id="LetsThink/MfM-Pipieline-8B", local_dir="xxx")
#snapshot_download(repo_id="LetsThink/MfM-Pipieline-2B", local_dir="xxx")
You can refer the inference script in scripts/inference.sh
PIPELINE_PATH=xxx
OUTPUT_DIR=outputs
TASK=t2v
python infer_mfm_pipeline.py \
--pipeline_path LetsThink/MfM-Pipeline-8B \
--output_dir $OUTPUT_DIR \
--task $TASK \
--crop_type keep_res \
--num_inference_steps 30 \
--guidance_scale 9 \
--motion_score 5 \
--num_samples 1 \
--upscale 4 \
--noise_aug_strength 0.0 \
--t2v_inputs your_prompt.txt \
In this work, we introduce a unified framework, namely many-for-many, which leverages the available training data from many different visual generation and manipulation tasks to train a single model for those different tasks. Specifically, we design a lightweight adapter to unify the different conditions in different tasks, then employ a joint image-video learning strategy to progressively train the model from scratch. Our joint learning leads to a unified visual generation and manipulation model with improved video generation performance. In addition, we introduce depth maps as a condition to help our model better perceive the 3D space in visual generation. Two versions of our model are trained with different model sizes (8B and 2B), each of which can perform more than 10 different tasks. In particular, our 8B model demonstrates highly competitive performance in video generation tasks compared to open-source and even commercial engines. 🚀✨
MfM_demo.mp4
If you find our code or model useful in your research, please cite:
@article{yang2025MfM,
title={Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks},
author={Ruibin Li, Tao Yang, Yangming Shi, Weiguo Feng, Shilei Wen, Bingyue Peng, Lei Zhang},
year={2026},
booktitle={arXiv preprint arXiv:2506.01758},
}

