Skip to content

Latest commit

 

History

History
31 lines (23 loc) · 1.57 KB

File metadata and controls

31 lines (23 loc) · 1.57 KB

Interpretable and Steerable Concept Bottleneck Sparse Autoencoders (CVPR 2026)

Our code will be made available soon.

  • We systematically analyze sparse autoencoders (SAEs) in large vision-language models (LVLMs) and uncover two major limitations:
    • a majority of SAE neurons exhibit either low interpretability or low steerability or both, rendering them ineffective for downstream use,
    • user-desired concepts are often absent in the SAE's learned dictionary, limiting its practical utility.
  • We address these limitations with our proposed Concept Bottleneck Sparse Autoencoders (CB-SAE):
    • using a novel post-hoc framework that prunes low-utility neurons; and
    • augmenting the SAE latent space with a concept bottleneck aligned to a user-defined concept set.
  • Our CBSAE improves interpretability by ~32% and steerability by ~14% across LVLMs and image generation tasks.

Overview

Cite this work

A. Kulkarni, T.-W. Weng, V. Narayanaswamy, S. Liu, W. A. Sakla, K. Thopalli, Interpretable and Steerable Concept Bottleneck Sparse Autoencoders, CVPR 2026

@inproceedings{kulkarni2026interpretable
    title={Interpretable and Steerable Concept Bottleneck Sparse Autoencoders},
    author={Kulkarni, Akshay and Weng, Tsui-Wei and Narayanaswamy, Vivek and Liu, Shusen and Sakla, Wesam and Thopalli, Kowshik},
    booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    year={2026},
}