Inject backdoors into a model so that adversarial examples can be detected by looking for the fingerprints of a backdoor.
Our best attack results are as follows, when evaluated on the first 100 images in the CIFAR-10 test set using the provided pretrained model..
- l_infinity distortion of 4/255: TODO attack success rate
- l_2 distortion of TODO: TODO attack success rate
This bit-depth quantization idea has been published many times in the past. The earliest paper which proposed (something like) it is TODO.
TODO