Oct. 29th, 2025: We update code for ICCV2025 camera-ready version.June. 26th, 2025: This paper is accepted by ICCV2025.August. 05th, 2024: We release log and ckpt for VSSD with MESA.July. 29th, 2024: When introduce MESA in training as MLLA, VSSD-B achieve 85.4% top-1 acc on ImageNet-1K !July. 25th, 2024: We release the code, log and ckpt for VSSD.
Recently, State Space Duality (SSD), an improved variant of SSMs, was introduced in Mamba2 to enhance model performance and efficiency. However, the inherent causal nature of SSD/SSMs restricts their applications in non-causal vision tasks. To address this limitation, we introduce Visual State Space Duality (VSSD) model, which has a non-causal format of SSD. This repository contains the code for training and evaluating VSSD varints on the ImageNet-1K dataset for image classification, COCO dataset for object detection, and ADE20K dataset for semantic segmentation. For more information, please refer to our paper.
| name | pretrain | resolution | acc@1 | #params | FLOPs | logs | ckpts |
|---|---|---|---|---|---|---|---|
| VSSD-Tiny | ImageNet-1K | 224x224 | 83.8 | 28M | 5.0G | - | ckpt |
| VSSD-Small | ImageNet-1K | 224x224 | 84.6 | 50M | 8.1G | - | ckpt |
| VSSD-Base | ImageNet-1K | 224x224 | 85.4 | 89M | 16.1G | - | ckpt |
We add several tricks including ASYNC_STATE, 2D RoPE Embedding and Normalization in the NC-SSD block to further improve the performance. Check the config with suffix _iccv2025 and source code for details.
For weights of downstream tasks, please contact me if needed.
| name | pretrain | resolution | acc@1 | #params | FLOPs | logs | ckpts |
|---|---|---|---|---|---|---|---|
| VSSD-Micro | ImageNet-1K | 224x224 | 82.5 | 14M | 2.3G | log | ckpt |
| VSSD-Tiny | ImageNet-1K | 224x224 | 83.6 | 24M | 4.5G | log | ckpt |
| VSSD-Small | ImageNet-1K | 224x224 | 84.1 | 40M | 7.4G | log | ckpt |
| VSSD-Base | ImageNet-1K | 224x224 | 84.7 | 89M | 16.1G | log | ckpt |
Enhanced model with MESA:
| name | pretrain | resolution | acc@1 | #params | FLOPs | logs | ckpts |
|---|---|---|---|---|---|---|---|
| VSSD-Tiny | ImageNet-1K | 224x224 | 84.1 | 24M | 4.5G | log | ckpt |
| VSSD-Small | ImageNet-1K | 224x224 | 84.5 | 40M | 7.4G | log | ckpt |
| VSSD-Base | ImageNet-1K | 224x224 | 85.4 | 89M | 16.1G | log | ckpt |
| Backbone | #params | FLOPs | Detector | box mAP | mask mAP | logs | ckpts |
|---|---|---|---|---|---|---|---|
| VSSD-Micro | 33M | 220G | MaskRCNN@1x | 45.4 | 41.3 | log | ckpt |
| VSSD-Tiny | 44M | 265G | MaskRCNN@1x | 46.9 | 42.6 | log | ckpt |
| VSSD-Small | 59M | 325G | MaskRCNN@1x | 48.4 | 43.5 | log | ckpt |
| VSSD-Micro | 33M | 220G | MaskRCNN@3x | 47.7 | 42.8 | log | ckpt |
| VSSD-Tiny | 44M | 265G | MaskRCNN@3x | 48.8 | 43.6 | log | ckpt |
| VSSD-Small | 59M | 325G | MaskRCNN@3x | 50.0 | 44.6 | - | ckpt |
| Backbone | Input | #params | FLOPs | Segmentor | mIoU(SS) | mIoU(MS) | logs | ckpts |
|---|---|---|---|---|---|---|---|---|
| VSSD-Micro | 512x512 | 42M | 893G | UperNet@160k | 45.6 | 46.0 | log | ckpt |
| VSSD-Tiny | 512x512 | 53M | 941G | UperNet@160k | 47.9 | 48.7 | log | ckpt |
Step 1: Clone the VSSD repository:
git clone https://github.com/YuHengsss/VSSD.git
cd VSSDStep 2: Environment Setup:
Create and activate a new conda environment
conda create -n VSSD
conda activate VSSDInstall Dependencies
pip install -r requirements.txtDependencies for Detection and Segmentation (optional)
pip install mmengine==0.10.1 mmcv==2.1.0 opencv-python-headless ftfy regex
pip install mmdet==3.3.0 mmsegmentation==1.2.2 mmpretrain==1.2.0Classification
To train VSSD models for classification on ImageNet, use the following commands for different configurations:
python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=8 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmpIf you only want to test the performance (together with params and flops):
python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp --resume </path/of/checkpoint> --evalDetection and Segmentation
To evaluate with mmdetection or mmsegmentation:
bash ./tools/dist_test.sh </path/to/config> </path/to/checkpoint> 1use --tta to get the mIoU(ms) in segmentation
To train with mmdetection or mmsegmentation:
bash ./tools/dist_train.sh </path/to/config> 8If VSSD is helpful for your research, please cite the following paper:
@article{shi2024vssd,
title={VSSD: Vision Mamba with Non-Causal State Space Duality},
author={Yuheng Shi and Minjing Dong and Mingjia Li and Chang Xu},
journal={arXiv preprint arXiv:2407.18559},
year={2024}
}
This project is based on VMamba(paper, code), Mambav2 (paper, code), Swin-Transformer (paper, code), OpenMMLab, thanks for their excellent works.
