This is the implementation of STDS
This repository is a fork from a transformer-based video diffusion model called latte.
First, download and set up the repo:
git clone https://github.com/dfki-av/STDS
cd STDSWe provide an environment.yml file that can be used to create a Conda environment. If you only want
to run pre-trained models locally on CPU, you can remove the cudatoolkit and pytorch-cuda requirements from the file.
conda env create -f environment.yml
conda activate stdsYou can sample from our pre-trained Latte models with sample.py.
To use the STDS model optimally, it is recommended to train the model with your own dataset.
This model has been trained by using dataset collected from Google Earth Engine Weights for our pre-trained Latte model can be found here. The script has various arguments to adjust sampling steps, change the classifier-free guidance scale, etc.
If you would like to measure the quantitative metrics of your generated results, please refer to here.
We provide a training script for Latte in train.py. The structure of the datasets can be found here. This script can be used to train class-conditional and unconditional
Latte models. To launch Latte (256x256) training with N GPUs on the FaceForensics dataset
:
torchrun --nnodes=1 --nproc_per_node=N train.py --config ./configs/ffs/ffs_train.yamlor If you have a cluster that uses slurm, you can also train Latte's model using the following scripts:
sbatch slurm_scripts/ffs.slurmWe also provide the video-image joint training scripts train_with_img.py. Similar to train.py scripts, these scripts can be also used to train class-conditional and unconditional
Latte models.
If you are familiar with PyTorch Lightning, you can also use the training script train_pl.py and train_with_img_pl.py provided by @zhang.haojie,
python train_pl.py --config ./configs/ffs/ffs_train.yamlor
python train_with_img_pl.py --config ./configs/ffs/ffs_img_train.yamlThis script automatically detects available GPUs and uses distributed training.
Prathap Kashyap: [prathapnkashyap@gmail.com]
If you find this work useful for your research, please consider citing it.
@inproceedings{kashyap2025spatiotemporal,
title={Spatiotemporal diffusion model for satellite imagery},
author={Kashyap, Prathap Nagaraj and Javanmardi, Alireza and Jaiswal, Pragati and Reis, Gerd and Pagani, Alain and Stricker, Didier},
booktitle={Eleventh International Conference on Remote Sensing and Geoinformation of the Environment (RSCy2025)},
volume={13816},
pages={376--384},
year={2025},
organization={SPIE}
}STDS has been greatly inspired by the following amazing works and teams: Latte DiT and PixArt-α, we thank all the contributors for open-sourcing.
The code and model weights are licensed under LICENSE.