SpatialDINO: A Self-Supervised 3D Vision Transformer that enables Segmentation and Tracking in Crowded Cellular Environments
SpatialDINO brings a self-supervised foundation model for analyzing 3D fluorescence microscopy images by adapting DINOv2-style joint-embedding training to learn dense volumetric features directly from unlabeled 3D datasets. By exploiting true 3D context rather than slice-wise โ2.5Dโ aggregation, it enables automated detection and segmentation in crowded, anisotropic, low-contrast volume and enables tracking in 4D time-lapse data. SpatialDINO generalizes across targets and imaging conditions without voxel-level annotation or retraining.
Authors: Alex Lavaee*, Arkash Jain*, Gustavo Scanavachi*, Jose Inacio Costa-Filho*, Adam Ingemansson, Tom Kirchhausen
* Equal contribution
All datasets and pre-trained models are publicly available through AWS S3. The datasets can also be accessed with Mirante4D.
Download training datasets:
aws s3 cp s3://spatialdino/dataset_part1/ ./datasets/ --recursive --no-sign-requestDownload inference datasets:
aws s3 cp s3://spatialdino/inference_data/ ./inference_data/ --recursive --no-sign-requestDownload models:
aws s3 cp s3://spatialdino/models/ ./models/ --recursive --no-sign-requestList available data:
aws s3 ls s3://spatialdino/ --no-sign-requestuv is a faster drop-in replacement for conda that we use for environment management. Download and install it via either
curl -LsSf https://astral.sh/uv/install.sh | shor
wget -qO- https://astral.sh/uv/install.sh | shgit clone --recursive git@github.com:kirchhausenlab/spatialdino.gitIn the repository directory, run
uv venv --python 3.12
uv sync --all-packagesThis creates a single root .venv shared by the core spatialdino package and the GUI server in
apps/server.
In the repository directory, run
cd apps/web
npm installIn the repository directory, run
cd apps/web
npm run devThen follow the GUI instructions shown in the terminal and browser.
This project requires CUDA version 12 or higher. Verify the correct version of CUDA installed by running:
nvcc --version
โ ๏ธ Important: Before running inference, ensure you have the pretrained model. Use the model path../models/backbone.pthwhich contains the pretrained weights for the DINO vision transformer.
#!/bin/bash
folder_path="/nfs/data1expansion/datasync3/Gustavo/20210422_0p5_0p55_sCMOS_Gu_AP2/CS1_Ap2_live_3colorsDic/Ex07_488_60mW_z0p5/ch488nmCamA/DS"
file_start=0
file_end=1 # exclusive; leave unset to process through the end
save_path="/raid1/cme_tests/results/ablations/ap2_test"
export OMP_NUM_THREADS=32
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export NUM_PROC_PER_NODE=$(echo "$CUDA_VISIBLE_DEVICES" | tr ',' '\n' | wc -l)
uv run torchrun --nnodes 1 --node_rank 0 --nproc_per_node $NUM_PROC_PER_NODE \
--rdzv_endpoint=localhost:9999 ./scripts/inference/inference.py \
file_path="$folder_path" \
save_path="$save_path" \
file_start=$file_start \
file_end=$file_end \
global_hist_min=null \
global_hist_max=null \
crop_params="[0,0,0,0,0,0]"folder_path: Path to folder containing imagesfile_start/file_end: File slice passed tofnames[file_start:file_end](file_endis exclusive)save_path: Path to save resultsglobal_hist_min/global_hist_max: Optional global histogram bounds. If both are provided, inference uses those shared values for all volumes instead of the default per-volume normalization. These correspond to the values written byscripts/inference/norm_per_vol.py.OMP_NUM_THREADS: Number of threads to useCUDA_VISIBLE_DEVICES: List of GPUs to useNUM_PROC_PER_NODE: Number of processes/GPUs per nodecrop_params: Parameters for cropping images
If you do not have a .bashrc file, create one:
touch ~/.bashrcAdd the following to your .bashrc file:
NCCL Configuration (NVIDIA's communication library):
export NCCL_SOCKET_NTHREADS=4 # number of threads per socket
export NCCL_NSOCKS_PERTHREAD=4 # number of sockets per thread
export NCCL_IB_DISABLE=0 # enable Infiniband
export NCCL_IB_HCA="mlx5" # use Mellanox Infiniband
export CUDA_HOME="/usr/local/cuda-12" # choose the correct CUDA version
export PATH=$CUDA_HOME/bin:$PATH
export CPATH="$CUDA_HOME/include:$CPATH"C++ Library:
export CXX=g++Distributed Training:
export NCCL_SOCKET_IFNAME=ib # use all infiniband interfaces
export RDZV_BACKEND="c10d"
export OMP_NUM_THREADS=16
export NUM_ALLOWED_FAILURES=3
export RDZV_ID="2001" # set the rdzv id to be the same for all nodes
export MASTER_PORT="29500" # set the master port to be the same for all nodesTo get the Master Address, get the IP address of your infiniband interface:
ibstatExample output:
ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2044
inet 10.1.0.11 netmask 255.255.0.0 broadcast 10.1.255.255
...Set the master address (in this case 10.1.0.11):
export MASTER_ADDR="10.1.0.11"
export RDZV_ENDPOINT="$MASTER_ADDR:$MASTER_PORT"For multi-node training with 3 nodes and 8 GPUs per node:
# Arguments explanation:
# --nnodes: number of nodes (e.g. 3)
# --node_rank: rank of the node (e.g. 0, 1, 2, ... n) for n nodes
# --nproc_per_node: number of processes/GPUs per node (e.g. 8)
# --master_addr: address of the master node (e.g. 10.10.10.10)
# --master_port: port of the master node (e.g. 29500)
torchrun --nnodes 3 --nproc_per_node 8 --node_rank $NODE_RANK \
--rdzv-id $RDZV_ID --rdzv-backend $RDZV_BACKEND \
--rdzv-endpoint $RDZV_ENDPOINT scripts/train/pretrain.py- Issues: SpatialDINO Issues
- Contact: Jose Inacio Costa-Filho (joseinacio@tklab.hms.harvard.edu), Tom Kirchhausen (kirchhausen@crystal.harvard.edu)
@article {spatialdino2025,
author = {Lavaee, Alex and Jain, Arkash and Scanavachi Moreira Campos, Gustavo and Costa-Filho, Jose Inacio and Ingemansson, Adam and Kirchhausen, Tom},
title = {SpatialDINO: A Self-Supervised 3D Vision Transformer that enables Segmentation and Tracking in Crowded Cellular Environments},
year = {2026},
doi = {10.64898/2025.12.31.697247},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2026/01/02/2025.12.31.697247},
journal = {bioRxiv}
}