Large-scale language-image pre-trained models (e.g., CLIP) have shown superior performances on many cross-modal retrieval tasks. However, the problem of transferring the knowledge learned from such models to video-based person re-identification (ReID) has barely been explored. In addition, there is a lack of decent text descriptions in current ReID benchmarks. To address these issues, in this work, we propose a novel one-stage text-free CLIP-based learning framework named TF-CLIP for video-based person ReID.
- [2024/01/01] We make TF-CLIP public. Happy New Year!!!
-
We propose a novel one-stage text-free CLIP-based learning framework named TF-CLIP for video-based person ReID. To our best knowledge, we are the first to extract identity-specific sequence features to replace the text features of CLIP. Meanwhile, we further design a Sequence-Specific Prompt (SSP) module to update the CLIP-Memory online.
-
We propose a Temporal Memory Diffusion (TMD) module to capture temporal information. The frame-level memories in a sequence first communicate with each other to extract temporal information. The temporal information is then further diffused to each token, and finally aggregated to obtain more robust temporal features.
- Performance
-
MARS : Model&Code PASSWORD: 1234
-
LSVID : Model&Code PASSWORD: 1234
-
iLIDS : Model&Code PASSWORD: 1234
- t-SNE Visualization
- Install the conda environment
conda create -n tfclip python=3.8
conda activate tfclip
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=10.2 -c pytorch
- Install the required packages:
pip install yacs
pip install timm
pip install scikit-image
pip install tqdm
pip install ftfy
pip install regex
- Prepare Datasets
Download the datasets (MARS, LS-VID and iLIDS-VID), and then unzip them to your_dataset_dir.
For example,if you want to run method on MARS, you need to modify the bottom of configs/vit_base.yml to
DATASETS:
NAMES: ('MARS')
ROOT_DIR: ('your_dataset_dir')
OUTPUT_DIR: 'your_output_dir'
Then, run
CUDA_VISIBLE_DEVICES=0 python train-main.py
For example, if you want to test methods on MARS, run
CUDA_VISIBLE_DEVICES=0 python eval-main.py
This project is based on CLIP-ReID and XCLIP. Thanks for these excellent works.
If you have any questions, please feel free to send an email to yuchenyang@mail.dlut.edu.cn or asuradayuci@gmail.com. .^_^.
If you find TF-CLIP useful for you, please consider citing 📣
@article{tfclip,
Title={TF-CLIP: Learning Text-Free CLIP for Video-Based Person Re-identification},
Author = {Chenyang Yu, Xuehu Liu, Yingquan Wang, Pingping Zhang, Huchuan Lu},
Volume={38},
Number={7},
Pages = {6764-6772},
Year = {2024},
booktitle= = {AAAI}
}TF_CLIP is released under the MIT License.




