This is the official code for our NeurIPS 2025 paper Instant Video Models: Universal Adapters for Stabilizing Image-Based Networks.
The goal of this work is to modify existing single-frame models to improve stability and robustness on video. Our approach involves minimal modifications to the original model. The original parameters can be frozen, such that we only train a lightweight set of stabilization adapters.
I spent some time cleaning up this codebase after the paper deadline. I re-ran a few experiments to test core functionality. However, I did not test all of the config files, due to the significant compute resources required to re-run experiments. If you encounter a problem, please submit a GitHub issue and I will do my best to help. -Matt
The two main scripts are ./scripts/train.py and ./scripts/evaluate.py. They should be invoked from the top level of the repository.
The runs folder contains a .yml configuration file for each experiment in the paper. The filepath of the config file is passed to the training or evaluation script. For example, to fine-tune the Deeplab segmentation model on the VIPER dataset:
./scripts/train.py runs/deeplab_viper/finetune_base_model.ymlThen, to train controlled spatial-fusion stabilizers on top of the fine-tuned model:
./scripts/train.py runs/deeplab_viper/train_controlled_spatial.ymlFinally, the stabilized model can be evaluated with:
./scripts/evaluate.py runs/deeplab_viper/evaluate_controlled_spatial.ymlScripts will print progress and results to the terminal. Results, weights, and log files are saved in a unique directory created in outputs.
Config files starting with underscores are not meant to be run directly - they are intended to be imported by other configs.
Some experiments load weights generated by a previous run. In this case, WEIGHTS_FILEPATH is used as a placeholder in config files. Replace WEIGHTS_FILEPATH with the location of the previously generated weights (typically in outputs).
Config values can be overridden on the command line. For example, to increase the number of stabilizer training epochs to 100:
./scripts/train.py runs/deeplab_viper/train_controlled_spatial.yml epochs=100An important use-case for overrides is in setting lambda, the strength of the temporal smoothness penalty (see our paper for details). If a script complains that lambda is not defined, it probably expects a command-line override, for example:
./scripts/evaluate.py runs/deeplab_viper/evaluate_controlled_spatial.yml lambda=0.4Some of the paper experiments vary lambda throughout some range. In this case, something like the following can be used:
for lambda in 0.1 0.2 0.4 0.8
do
./scripts/train.py runs/deeplab_viper/train_controlled_spatial.yml lambda=$lambda
doneFor some baselines, lambda is replaced with another parameter controlling the degree of smoothing. For the gaussian stabilizer we use sigma and for the simple_fixed stabilizer we use alpha. These parameters should be overridden via the command-line, analogous to lambda.
Some models and datasets require manual downloads. See the "Dataset Setup" and "Model Setup" sections below for instructions.
After cloning this repository, initialize submodules (containing code for external models) by running:
git submodule init
git submodule updateWe manage dependencies using Conda. To create the Conda environment, run:
conda env create --file environment.ymlThen activate the environment with:
conda activate instant-video-modelsThe file environment_precise.yml contains more exact package versions and can be used to reproduce the original development environment. To create an environment based on environment_precise.yml, run:
conda env create --file environment_precise.ymlScripts assume that the working directory is on the Python path. To set this up, run the following in Bash (or add it to .bashrc):
if [[ :$PYTHONPATH: != *:.:* ]]
then
export PYTHONPATH="$PYTHONPATH:."
fiRun the script ./scripts/datasets/download_davis.sh to download and unpack the DAVIS dataset.
Run the script ./scripts/datasets/crop_davis.py to crop the DAVIS dataset to only the annotated sections. This step is only required if running the adversarial robustness experiments.
Run the script ./scripts/datasets/download_nfs.sh to download and unpack the NFS dataset.
After downloading NFS, run the following to generate the local Laplacian variant:
matlab -batch 'run("scripts/datasets/generate_nfs_local_laplacian.m")'This requires a MATLAB installation with the image processing toolbox.
Comment/uncomment the marked blocks at the top of the file to adjust the strength of the local Laplacian effect.
Download the following files from the Spring website:
test_frame_left.ziptest_frame_right.ziprain.zipsnow.zip
Place the downloaded files in data/robust_spring. Then run ./scripts/datasets/unpack_robust_spring.sh to unpack the zip files. Note this script deletes the original zip files after unpacking to save space.
Download the following ZIP files from Google Drive:
- Training image sequences (dense frames, compressed):
- https://drive.google.com/file/d/1-O7vWiMa3mDNFXUoYxE3vkKZQpiDXUCf/view
- https://drive.google.com/file/d/1alD_fZja9qD7PUnk4AkD6l-jBhlCnzKr/view
- https://drive.google.com/file/d/19Da-Ac_9KMjexvYGkfjAFowWEGGxMR3I/view
- https://drive.google.com/file/d/1KZh-z7SeKJDjOG08MWPKX2UBaZ4d-xjF/view
- https://drive.google.com/file/d/1CeNQ0h1Kr00J45izEYXXhLNghQANRNDG/view
- https://drive.google.com/file/d/1Vf0MwcgKaz6zgjvqEYlnRbud_lAT4-JK/view
- Training class labels (dense frames):
- Validation image sequences (dense frames, compressed):
- https://drive.google.com/file/d/1951O6Eu-VuMHaL1vJ9V35njcj30GjPiN/view
- https://drive.google.com/file/d/1OqEjlrx97ThCMlQePEZPSBjqhRqPwOEd/view
- https://drive.google.com/file/d/1zo5ZKE90N0iE7E_KJU8_FBT0N4ne6knK/view
- https://drive.google.com/file/d/1QZuPkd_3dRLqZXgwf29gHXtaI8MEbbAS/view
- https://drive.google.com/file/d/1LgMGWfp_R6hwmMd-zTjrAw5OfflXS01W/view
- Validation class labels (dense frames):
Place the downloaded files in data/viper. Then run ./scripts/datasets/unpack_viper.sh to unpack the zip files. Note this script deletes the original zip files after unpacking to save space.
Please contact us for a copy of this data (the dataset is ~200 GB and we have not found a good way to share it publicly).
Download the files decoder.pth and vgg_normalised.pth from this GitHub releases page. Place these files in ./weights/adain. Then run ./scripts/models/prepare_adain_weights.py to generate a single merged weight file that can be used more easily with our codebase.
Download the best_deeplabv3plus_mobilenet_cityscapes_os16.pth weights using this link and place them in weights/deeplab. Then run ./scripts/models/prepare_deeplab_weights.py to convert these weights to a format usable with our codebase.
Download weights using this link. Extract the contents of the zip file to weights/hdrnet. Then run ./scripts/models/prepare_hdrnet_weights.py to convert these weights to a format usable with our codebase.
Download the NAFNet-SIDD-width32.pth weights using this Google Drive link. Place this file in ./weights/nafnet. Then run ./scripts/models/prepare_nafnet_weights.py to convert these weights to a format usable with our codebase.