Detection false positives?

When evaluating whether inputs are adversarial, the framework first checks whether the classification of the input matches the groundtruth label. If it does not, then it uses the detection mechanism to reject/ignore inputs. Only when both conditions are satisfied does the framework consider the input adversarial.
https://github.com/google-research/selfstudy-adversarial-robustness/blob/15d1c0126e3dbaa205862c39e31d4e69afc08167/common/framework.py#L217-L234
My expectation was that correctly classified inputs also ought to be rejected if they trip the detector, but because L223 returns early this can never happen. This is particularly pronounced in the transform defense where a non-trivial majority of the benign inputs would be rejected by the "stable prediction" detector. Is this intentional? It’s a little weird to force the attacker to defeat some objective that the defender can almost never achieve.

	def evaluate(self, defense, example_idx, true_label,
	src_example, adv_example,
	src_pred, adv_pred,
	src_detector, adv_detector):
	# Verify that the label is now incorrect
	if np.argmax(adv_pred) == true_label:
	return False, "Label {} matches true label {}".format(np.argmax(adv_pred), true_label)

	# Verify that example is within the allowed Lp norm
	distortion = np.linalg.norm((src_example - adv_example).flatten(), ord=self.norm)
	if distortion > self.threshold + 1e-3:
	return False, "Distortion {} exceeds bound {}".format(distortion, self.threshold)

	# Verify that it's not detected as adversarial
	if adv_detector > defense.threshold:
	return False, "Adversarial example rejected by detector with score {}.".format(adv_detector)

	return True, None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detection false positives? #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Detection false positives? #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions