In this repository, we provide code for applying Spatiotemporal Information Mining Token Merging (STIM-TM) on the Surgformer baseline.
The provided code extends the original code for Surgformer.
conda create -n STIM-TM python==3.8.13
conda activate STIM-TM
pip install torch==1.13.0+cu117 torchvision==0.14.0+cu117 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu117
pip install -r requirements.txtPlease follow the Surgformer code to prepare the dataset and download the pre-trained parameters for TimeSformer.
run the following code for training
sh scripts/train.sh- run the following code for testing, and get 0.txt, 1.txt, ... (Testing with N GPUs will result in N files);
sh scripts/test.sh- Merge the files and generate separate txt file for each video;
python datasets/convert_results/convert_cholec80.py
python datasets/convert_results/convert_autolaparo.py- Use Matlab Evaluation Code to compute metrics;
run the following code for testing:
sh scripts/test_STIM_TM.shThanks to the authors of following open-source projects: