(CVPR Workshop on Human Motion Generation 2025)
MoCLIP enhances the CLIP encoder for human motion generation by integrating motion-aware features and contrastive alignment strategies. It offers immediate plug-and-play improvement to CLIP-based motion generation pipelines while preserving semantic richness.
conda env create -f environment.yml
conda activate momask
pip install git+https://github.com/openai/CLIP.git
Tested with Python 3.7.13 and PyTorch 1.7.1.
Download Pre-trained Models
[Todo] - To add link to pre-trained model
-
HumanML3D
Follow the instructions from the original HumanML3D repo: https://github.com/EricGuo5513/temos
Then copy the result into this repo:cp -r ../HumanML3D/HumanML3D ./dataset/HumanML3D
-
KIT-ML
Also available from the HumanML3D setup. Place it in:./dataset/KIT-ML
Note: These datasets are maintained by their original authors. Please refer to their repositories for licensing and setup requirements.
To train MoCLIP on your machine:
python train_moclip.py --lda 0.4 --dataset_name t2m --gpu_id 0
This argument controls the tethering loss, which helps preserve CLIP’s original semantics while adapting to motion tasks.
- Best performing value on HumanML3D: 0.4
- You may experiment with values like 0.2, 0.4, and 0.6 for different models or datasets.
| Argument | Description | Default |
|---|---|---|
| --lda | Distillation loss weight (lambda) | 0.6 |
| --dataset_name | Name of the dataset ('t2m', 'kit', etc.) | t2m |
| --gpu_id | ID of GPU to use | 0 |
MoCLIP integrates easily into CLIP-based text-to-motion models like MoMask, BAD, and BAMM.
See the examples/ folder for working demos.
- Create motion encoder
res_transformer.clip_model.motion = MotionTransformerv1(opt.dataset_name).to(opt.device)
- Add motion encoder function
res_transformer.clip_model.encode_motion = encode_motion.__get__(res_transformer.clip_model, type(res_transformer.clip_model))
- Load MoCLIP weights
kpt = torch.load(opt.motion_clip, map_location=opt.device)
- Replace CLIP weights with MoCLIP
res_transformer.clip_model.load_state_dict(kpt, strict=False)
train_moclip.py– Script for training MoCLIPexamples/– Integration code for BAD, BAMM, and MoMaskdataset/– Expected folder for HumanML3D and KIT-ML datasets
@article{maldonado2025moclip,
title={MoCLIP: Motion-Aware Fine-Tuning and Distillation of CLIP for Human Motion Generation},
author={Maldonado, Gabriel and Danesh Pazho, Armin and Alinezhad Noghre, Ghazal and Katariya, Vinit and Tabkhi, Hamed},
journal={arXiv preprint arXiv:2505.10810},
year={2025}
}