Introduction

In this tutorial, we will demonstrate how to use a proxy model in the attack stage. A proxy model may be useful when the gradients generated by the model under evaluation are either unavailable or unreliable.

Goal

Starting with the baseline CIFAR10 scenario here, we will first derive a modified model which behaves as follows: during inference, the model will quantize the input image into 256 distinct values. This will remove any adversarial perturbations which include fractional pixel values. This is also problematic from an attack perspective since the quantization operation is non-differentiable. The modified model is available here.

To get around this difficulty, we will implement a proxy model during the attack. The proxy model will be a differentiable approximation to the original model (i.e. it will remove the quantization operation) and will allow for reasonable gradients to be used for the white box attack.

Implementation

As a first step, we will first modify the scenario configuration to evaluate the modified model by modifying the module and name fields. The updated model configuration is shown below.

"model": {
    "fit": true,
    "fit_kwargs": {
        "nb_epochs": 20
    },
    "model_kwargs": {},
    "module": "proxy_model_eval_model",
    "name": "get_art_model",
    "weights_file": null,
    "wrapper_kwargs": {}
},

Next, we will again create a custom attack class by modifying the module and name fields of the attack configuration. Additionally, we will reduce eps_step to 0.002, to introduce fractional pixel values into the perturbed images. The updated attack configuration is shown below.

"attack": {
    "knowledge": "white",
    "kwargs": {
        "batch_size": 1,
        "eps": 0.031,
        "eps_step": 0.002,
        "max_iter": 20,
        "num_random_init": 1,
        "random_eps": false,
        "targeted": false,
        "verbose": false
    },
    "module": "proxy_model_attack_model",
    "name": "CustomAttack",
    "use_label": true
},

Lastly, we create the proxy_model_attack_model.py file, including a CustomAttack class and a proxy model to use during attack. The code for this change can be seen below. Here, we have created a proxy model given by ModifiedNet, which is simply the model under evaluation without the quantization operation. We have also created make_modified_model and get_art_model convenience methods to return the proxy model. Finally, we have created the CustomAttack class which inherits ProjectedGradientDescent. The key difference here instead of a standard PGD attack is the use of the proxy model during the attack generation. This is performed by creating the proxy model via a call to get_art_model, loading the weights from the original model into the proxy model, and initializing an attack with the proxy model.

import torch
import torch.nn as nn
from typing import Optional

from art.attacks.evasion import ProjectedGradientDescent
from armory.baseline_models.pytorch.cifar import Net
from art.classifiers import PyTorchClassifier


DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")


class ModifiedNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = Net()

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.net.forward(x)


def make_modified_model(**kwargs) -> ModifiedNet:
    return ModifiedNet()


def get_art_model(
    model_kwargs: dict, wrapper_kwargs: dict, weights_path: Optional[str] = None
) -> PyTorchClassifier:
    model = make_modified_model(**model_kwargs)
    model.to(DEVICE)

    if weights_path:
        checkpoint = torch.load(weights_path, map_location=DEVICE)
        model.load_state_dict(checkpoint)

    wrapped_model = PyTorchClassifier(
        model,
        loss=nn.CrossEntropyLoss(),
        optimizer=torch.optim.Adam(model.parameters(), lr=0.003),
        input_shape=(32, 32, 3),
        nb_classes=10,
        clip_values=(0.0, 1.0),
        **wrapper_kwargs,
    )
    return wrapped_model


class CustomAttack(ProjectedGradientDescent):
    def __init__(self, estimator, **kwargs):

        # Create copy of the model (to avoid overwriting loss_gradient_framework of original model)
        new_estimator = get_art_model(model_kwargs={}, wrapper_kwargs={})
        new_estimator.model.load_state_dict(
            estimator.model.state_dict()
        )

        # Point attack to copy of model
        super().__init__(new_estimator, **kwargs)

Complete Example

The complete example is demonstrated via the following files:

This example may be run with the following command:

armory run proxy_model.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduction

Goal

Implementation

Complete Example

FilesExpand file tree

proxy_model.md

Latest commit

History

proxy_model.md

File metadata and controls

Introduction

Goal

Implementation

Complete Example