Krishna Kanth Nakka and Alexandre Alahi
The generation of transferable adversarial perturbations typically involves training a generator to maximize embedding separation between clean and adversarial images at a single mid-layer of a source model. In this work, we build on this approach and introduce Neuron Attack for Transferability (NAT), a method designed to target specific neuron within the embedding. Our approach is motivated by the observation that previous layer-level optimizations often disproportionately focus on a few neurons representing similar concepts, leaving other neurons within the attacked layer minimally affected. NAT shifts the focus from embeddinglevelseparation to a more fundamental, neuron-specific approach. We find that targeting individual neurons effectively disrupts the core units of the neural network, providing a common basis for transferability across different models. Through extensive experiments on 41 diverse ImageNet models and 9 fine-grained models, NAT achieves fooling rates that surpass existing baselines by over 14% in crossmodel and 4% in cross-domain settings.
For more details, refer to the main paper and supplementary at CVF Website
The code has been tested on following packages:
conda env create -f environment.yaml- For evaluation, we use subset of 5000 images available at the data subfolder. This subset is taken from the LTP paper of NeurIPS 2021.
- You can easily load and test our generator model attacking
Neuron 250in layer 18 ofVGG16via Torch Hub with just a few lines of code:
import torch
generator = torch.hub.load("krishnakanthnakka/NAT", "generator", neuron = 250, layer = 18, source_model = "vgg16")
generator.eval()
generator.cuda()-
WWe have provided the checkpoint for
Neuron 250along with this repository in the releases section. This checkpoint should reproduce the results presented inTables 2,3, and4of the main paper using the queryk=1. -
To run the attack on
ResNet152, use the following command:python eval.py --nat_attacked_neuron 250
-
Please refer to the Table 1 in the supplementary for the exact versions of the target models.
-
For training, we use the LTP repository available and change the loss function to choose the single channel instead of all channels. The modified loss function is available in the file loss.py
-
For
generator, we used a slightly modified architecture removingreflectionpadas we found that to be make results non-deterministic even with same seed.
@InProceedings{Nakka_2025_WACV,
author = {Nakka, Krishna Kanth and Alahi, Alexandre},
title = {NAT: Learning to Attack Neurons for Enhanced Adversarial Transferability},
booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
month = {February},
year = {2025},
pages = {7582-7593}
}
- We would like thank the authors of CDA who inspired me to work in this direction and for releasing their codebase opensource.
