Skip to content

rittaneg86/CSC592_DaST_Project

Repository files navigation

DaST: Data-free Substitute Training for Adversarial Attacks

An adversarial data-free substitute training method for adversarial attacks..

Abstract: Machine learning models are vulnerable to adversarial examples. For the black-box setting, current substitute attacks need pre-trained models to generate adversarial examples. However, pre-trained models are hard to obtain in real-world tasks. In this paper, we propose a data-free substitute training method (DaST) to obtain substitute models for adversarial black-box attacks without the requirement of any real data. To achieve this, DaST utilizes specially designed generative adversarial networks (GANs) to train the substitute models. In particular, we design a multi-branch architecture and label-control loss for the generative model to deal with the uneven distribution of synthetic samples. The substitute model is then trained by the synthetic samples generated by the generative model, which are labeled by the attacked model subsequently. The experiments demonstrate the substitute models produced by DaST can achieve competitive performance compared with the baseline models which are trained by the same train set with attacked models. Additionally, to evaluate the practicability of the proposed method on the real-world task, we attack an online machine learning model on the Microsoft Azure platform. The remote model misclassifies 98.35% of the adversarial examples crafted by our method. To the best of our knowledge, we are the first to train a substitute model for adversarial attacks without any real data.

Link: https://openaccess.thecvf.com/content_CVPR_2020/html/Zhou_DaST_Data-Free_Substitute_Training_for_Adversarial_Attacks_CVPR_2020_paper.html

DaST: Data-Free Substitute Training (Improved) This repository implements the DaST framework for data-free substitute training of adversarial attacks, along with our proposed improvements (total-variation regularizer & mode-seeking loss). We provide scripts to train both DaST-L (label-only) and DaST-P (probability-only) variants on CIFAR-10, a baseline ResNet-50 substitute, and an evaluation script for end-to-end attack performance.

Thread Model

The threat model here is :

  • Synthetic data (X) generated from GANs , X = G(z,n)
  • X is labeled by query access to the targeted model (T), T(X) → y
  • Substitute model (D) 
  • Substitute model is trained on (X,y) , D(X,y)
  • A white box attack (FGSM, PGD, BIM, C&W ) is done on the substitute model (D) to produce adversarial examples (X_adv) which is used to fool the targeted model (T)
  • Maximum perturbation  Note: The GAN is use specifically just to generate synthetic images that are used to train the substitute model, so that the substitute model is strong enough to mimic the attacked model and generate very good adverserial examples that can fool the target model.

A. Setup:

Clone this repo:

git clone  https://github.com/rittaneg86/CSC592_DaST_Project.git

Install dependencies All required packages are listed in requirements.txt.

pip install -r requirements.txt

Experimental setup

Python 3.12.9
PyTorch 2.6.0 with CUDA 12.8
torchvision==0.21.0
foolbox==3.3.4
advertorch==0.2.3

  • More of these packages abd their versions can be seen in the requirements.txt file
  • The initiall DAST framework was implementated on PyTorch 1.0+. This version has been adapted to work with PyTorch 2.6.0.

Hardware Setup

GPU: NVIDIA NVIDIA H100 80GB HBM3(80GB, Turing architecture)
CUDA 12.8
Driver version 570.86.15

This work can steal the attacked model without the requirement of any real data. If you want to evaluate the performance of DaST in terms of adversarial attacks, you can use the evaluate.py to do it. Train the target VGG-16 on CIFAR-10 The DaST scripts query a pre-trained VGG-16 target.

python train_vgg_cifar10.py

Saves both best & final VGG-16 checkpoints.

The final model is loaded by dast_*_cifar10_imp.py during substitute training.

B. Experiments

Experiments of original paper:

  1. Train the substitute model.

a. If you want to train a substitute model in CIFAR10 to replicate the results of DaST in the original paper run:

  • DaST-p
python dast_cifar10.py --beta=1.0 
  • DaST-L
python dast_cifar10.py --beta=0.0

b. If you want to train a substitute model in MNIST:

python dast.py --dataset=mnist

c. If you want to train a substitute model in AZURE:

python dast.py --dataset=azure

d. Train to get the the Pretrained ResNet-50 Model on CIFAR-10:

python resnet50_cifar10.py

Trains (or loads) a ResNet-50 substitute on CIFAR-10. Saves checkpoint to pretrained/resnet50_cifar10.pth.

C. Explore the imporovement

a. If you want to train a substitute model in CIFAR10 for the improvement run:

  • DaST-L (Label-only)
python dast_L_cifar10_imp.py --tv_weight --mode-seeking --beta=0.0

Loads vgg_cifar10_final.pth as the black-box target. Saves best netD & netG checkpoints to saved_model6/.

  • DaST-P (Probability-only)
python dast_P_cifar10_imp.py --tv_weight  --mode-seeking --beta=1.0

Identical codebase; only --beta differs. Saves to saved_model5/. Tip: Run both in parallel (e.g. separate GPUs or batch jobs) to reduce total runtime—our runs took ~24 hrs on 1 GPU (80 GB VRAM).

C. Evaluation

a. Once the substitute model is obtained, generate adversarial examples and evaluate their performance in non-targeted and targeted attacks

python improved_evaluation.py --mode=dast --adv=FGSM --cuda   # You can specify the mode for baseline or dast and the type of attacks.

b. For the improvement, (FGSM, BIM, PGD) use an ℓ∞ budget of ε = 0.031 on CIFAR-10. Both untargeted and targeted evaluations are supported. To run the evaluation for the improvement;

python improved_evaluation.py \
  --mode=dast \        # Specify the mode for baseline or dast 
  --adv=PGD \
  --epsilon=0.031 \
  [--targeted]       # omit for untargeted

--mode=dast → load your best DaST-P (saved_model5/) or DaST-L (saved_model6/) substitute. --mode=baseline → evaluate the ResNet-50 baseline. --mode=white  → “white-box” using the VGG-16 target itself.

Visualization

  • If you would like to view some of the synthetic images the GAN generated after our improvement you can see them in the output folder.
  • If you would like to view some of the adverserial images the substitute model generated on CIFAR10 after our improvement you can see them in the output folder.

#Notes

(1) Our improvement files can be founf in the Improvement Files folder.

(2) downloaded the remote model, so you do not need to employ the azure model as a service to evaluate the method.

(3) The attack success rate in the training code is just a rough estimate of attack performance, but it is fast. So you could run the evaluate.py to evaluate the performance of trained model.

(4) If you want to train a substitute model in other dataset (like CIFAR-10), you can add a CIFAR-10's model as the original_net and load the dataset. Note that in this code the output size of generator is [28, 28]. So on the CIFAR-10 dataset, the architecher of generator need to be modified, and the output size need to be [32, 32].

(5) Key things to look out for, there are some important arguments in this code.

  • The alpha controls the weight of label-control loss in (9) of the original paper.
  • The beta determines the attack scenario. If the beta is 0, the attack scenario is DaST-L, if the 'beta' is not 0 (>0), the attack scenario is DaST-P.
  • Two types of generator architecture are defined , you can switch by 'G_type'. It is hard to say which type is better, you can try to use them in your own dataset. Because of the multi-branch architecture of the generator, the 'batchsize' is best divisible by the number of categories.
  • The tv_weight sets total-variation penalty is added to the generators loss which helps spatially smooth (i.e. less noisy) images:the substitute model see higher-quality synthetics image, which in turn leads to better imitation of the target and stronger adversarial transfer.
  • mode_seeking(link: https://arxiv.org/pdf/1903.05628) weight for mode-seeking loss which is also added to the generators's loss to make sure the generator produces very distinct images but of the same condition(if the condition is to generate an image of a dog generate different images of a dog(colors)) this helps the substitute model learn diverst data.

(6) Note that the training for CIFAR-10 is not stable, even I set all random seed and I don't know why. You can observe it in the log file for cifar-10. The first example is failed, the training is collapsed. The second example is better, the attack success rate increases to 80% after 50 epochs in DaST-P.

(7) In my improvement i trained CIFAR10 for 24hrs and i got an attcak success rate of 99.72 and Best substitute model Accuracy: 43.15 at epoch 540 in DAST-p ( check dast_p_cifar10.log).

(8) I used foolbox to run the attacks , but i kept recieving errors when i ran FGSM targeted with foolbox( error about the Targetmisclassification criterion not part of FGSM)

Citation:

If you feel this work is helpful, please cite us:

@inproceedings{zhou2020dast,
  title={DaST: Data-free Substitute Training for Adversarial Attacks},
  author={Zhou, Mingyi and Wu, Jing and Liu, Yipeng and Liu, Shuaicheng and Zhu, Ce},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={234--243},
  year={2020}
}

Contact:

For questions related to this updated implementation Ritta Neg Mfa (ritta.negmfa@uri.edu)

If you have any question, please contact Mingyi Zhou.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages