VSSD

VSSD: Vision Mamba with Non-Causal State Space Duality

Updates

Oct. 29th, 2025: We update code for ICCV2025 camera-ready version.
June. 26th, 2025: This paper is accepted by ICCV2025.
August. 05th, 2024: We release log and ckpt for VSSD with MESA.
July. 29th, 2024: When introduce MESA in training as MLLA, VSSD-B achieve 85.4% top-1 acc on ImageNet-1K !
July. 25th, 2024: We release the code, log and ckpt for VSSD.

Introduction

Recently, State Space Duality (SSD), an improved variant of SSMs, was introduced in Mamba2 to enhance model performance and efficiency. However, the inherent causal nature of SSD/SSMs restricts their applications in non-causal vision tasks. To address this limitation, we introduce Visual State Space Duality (VSSD) model, which has a non-causal format of SSD. This repository contains the code for training and evaluating VSSD varints on the ImageNet-1K dataset for image classification, COCO dataset for object detection, and ADE20K dataset for semantic segmentation. For more information, please refer to our paper.

Main Results

Classification on ImageNet-1K (ICCV2025 Version)

name	pretrain	resolution	acc@1	#params	FLOPs	logs	ckpts
VSSD-Tiny	ImageNet-1K	224x224	83.8	28M	5.0G	-	ckpt
VSSD-Small	ImageNet-1K	224x224	84.6	50M	8.1G	-	ckpt
VSSD-Base	ImageNet-1K	224x224	85.4	89M	16.1G	-	ckpt

We add several tricks including ASYNC_STATE, 2D RoPE Embedding and Normalization in the NC-SSD block to further improve the performance. Check the config with suffix _iccv2025 and source code for details.

For weights of downstream tasks, please contact me if needed.

Classification on ImageNet-1K

name	pretrain	resolution	acc@1	#params	FLOPs	logs	ckpts
VSSD-Micro	ImageNet-1K	224x224	82.5	14M	2.3G	log	ckpt
VSSD-Tiny	ImageNet-1K	224x224	83.6	24M	4.5G	log	ckpt
VSSD-Small	ImageNet-1K	224x224	84.1	40M	7.4G	log	ckpt
VSSD-Base	ImageNet-1K	224x224	84.7	89M	16.1G	log	ckpt

Enhanced model with MESA:

name	pretrain	resolution	acc@1	#params	FLOPs	logs	ckpts
VSSD-Tiny	ImageNet-1K	224x224	84.1	24M	4.5G	log	ckpt
VSSD-Small	ImageNet-1K	224x224	84.5	40M	7.4G	log	ckpt
VSSD-Base	ImageNet-1K	224x224	85.4	89M	16.1G	log	ckpt

Object Detection on COCO

Backbone	#params	FLOPs	Detector	box mAP	mask mAP	logs	ckpts
VSSD-Micro	33M	220G	MaskRCNN@1x	45.4	41.3	log	ckpt
VSSD-Tiny	44M	265G	MaskRCNN@1x	46.9	42.6	log	ckpt
VSSD-Small	59M	325G	MaskRCNN@1x	48.4	43.5	log	ckpt
VSSD-Micro	33M	220G	MaskRCNN@3x	47.7	42.8	log	ckpt
VSSD-Tiny	44M	265G	MaskRCNN@3x	48.8	43.6	log	ckpt
VSSD-Small	59M	325G	MaskRCNN@3x	50.0	44.6	-	ckpt

Semantic Segmentation on ADE20K

Backbone	Input	#params	FLOPs	Segmentor	mIoU(SS)	mIoU(MS)	logs	ckpts
VSSD-Micro	512x512	42M	893G	UperNet@160k	45.6	46.0	log	ckpt
VSSD-Tiny	512x512	53M	941G	UperNet@160k	47.9	48.7	log	ckpt

Getting Started

Installation

Step 1: Clone the VSSD repository:

git clone https://github.com/YuHengsss/VSSD.git
cd VSSD

Step 2: Environment Setup:

Create and activate a new conda environment

conda create -n VSSD
conda activate VSSD

Install Dependencies

pip install -r requirements.txt

Dependencies for Detection and Segmentation (optional)

pip install mmengine==0.10.1 mmcv==2.1.0 opencv-python-headless ftfy regex
pip install mmdet==3.3.0 mmsegmentation==1.2.2 mmpretrain==1.2.0

Quick Start

Classification

To train VSSD models for classification on ImageNet, use the following commands for different configurations:

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=8 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp

If you only want to test the performance (together with params and flops):

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp --resume </path/of/checkpoint> --eval

Detection and Segmentation

To evaluate with mmdetection or mmsegmentation:

bash ./tools/dist_test.sh </path/to/config> </path/to/checkpoint> 1

use --tta to get the mIoU(ms) in segmentation

To train with mmdetection or mmsegmentation:

bash ./tools/dist_train.sh </path/to/config> 8

Citation

If VSSD is helpful for your research, please cite the following paper:

@article{shi2024vssd,
         title={VSSD: Vision Mamba with Non-Causal State Space Duality}, 
         author={Yuheng Shi and Minjing Dong and Mingjia Li and Chang Xu},
         journal={arXiv preprint arXiv:2407.18559},
         year={2024}
}

Acknowledgment

This project is based on VMamba(paper, code), Mambav2 (paper, code), Swin-Transformer (paper, code), OpenMMLab, thanks for their excellent works.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.idea		.idea
assets		assets
classification		classification
detection		detection
segmentation		segmentation
.gitignore		.gitignore
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VSSD

VSSD: Vision Mamba with Non-Causal State Space Duality

Updates

Introduction

Main Results

Classification on ImageNet-1K (ICCV2025 Version)

Classification on ImageNet-1K

Object Detection on COCO

Semantic Segmentation on ADE20K

Getting Started

Installation

Quick Start

Citation

Acknowledgment

About

Uh oh!

Releases

Packages

Uh oh!

Languages

YuHengsss/VSSD

Folders and files

Latest commit

History

Repository files navigation

VSSD

VSSD: Vision Mamba with Non-Causal State Space Duality

Updates

Introduction

Main Results

Classification on ImageNet-1K (ICCV2025 Version)

Classification on ImageNet-1K

Object Detection on COCO

Semantic Segmentation on ADE20K

Getting Started

Installation

Quick Start

Citation

Acknowledgment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages