This repo is an unofficial implementation of Test-Time Visual In-Context Tuning (VICT). Since the original paper leaves some details (e.g., creating noise data and visual prompts) a bit vague, this implementation follows the default settings of Painter.
Warning
I’ve tried my best to match Painter’s results (still a few small gaps).
VICT requires a lot of time for test-time adaptation (TTA) on each sample, and the authors tested on a custom dataset (by adding noise to the original). That part isn’t reproduced here.
uv sync
# This is important to load the env
source .venv/bin/activate
mim install "mmcv-full==1.7.2"
mim install "mmdet==2.28.2"
mim install "mmpose==0.29.0"
python misc/fix_mm_file.pyDataset setup is exactly the same as Painter. 👉 See Painter’s DATA.md for instructions.
All inference scripts live in the inference folder.
Run them with:
# ensure the environment is loaded
bash script/inference.shUsage: script/inference.sh [--num-gpu N] [--save-dir DIR] [--data-path PATH] [--use-ema true|false] [--resume true|false] [--optimize-steps N] [--mixed-precision MODE]
Arguments:
--num-gpu N (Optional) Number of GPUs to use. Default: 1
--save-dir DIR (Optional) Directory to save outputs. Default: results
--data-path PATH (Optional) Path to data. Default: datasets
--use-ema true|false (Optional) Use EMA. Default: false
--resume true|false (Optional) Resume from checkpoint. Default: false
--optimize-steps N (Optional) Optimize steps. Default: 0
--mixed-precision MODE (Optional) Mixed precision mode ('no', 'fp16'). Default: no
-h, --help Show this help message.
Run with 8 GPUs and EMA updates enabled:
bash script/inference.sh --num-gpu 8 --use-ema trueNote
If you leave --optimize-steps at the default (0), it will fall back to Painter’s behavior.
To align with the VICT paper, set it to 60.
Please cite the original paper:
@inproceedings{xie2025test,
title={Test-Time Visual In-Context Tuning},
author={Xie, Jiahao and Tonioni, Alessio and Rauschmayr, Nathalie and Tombari, Federico and Schiele, Bernt},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
pages={19996--20005},
year={2025}
}