Detecting Adversarial Perturbations Through Spatial Behavior in Activation Spaces

被引:9
作者
Katzir, Ziv [1 ]
Elovici, Yuval [1 ]
机构
[1] Ben Gurion Univ Negev, Dept Software & Informat Syst Engn, Deutsch Telekom Labs, Beer Sheva, Israel
来源
2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2019年
关键词
Adversarial Perturbations; Detector; Activation Spaces;
D O I
10.1109/ijcnn.2019.8852285
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although neural network-based classifiers outperform humans in a range of tasks, they are still prone to manipulation through adversarial perturbations. Prior research has resulted in the identification of effective defense mechanisms for many reported attack methods, however a defense against the C&W attack, as well as a holistic defense mechanism capable of countering multiple different attack methods, are still missing. All attack methods reported so far share a common goal. They aim to avoid detection by limiting the allowed perturbation magnitude, and still trigger incorrect classification. As a result, small perturbations cause classification to shift from one class to another. We coined the term activation spaces to refer to the hyperspaces formed by the activation values of the different network layers. We then use activation spaces to capture the differences in spatial dynamics between normal and adversarial examples, and form a novel adversarial example detector. We induce a set of k-nearest neighbor (k-NN) classifiers, one per activation space, and leverage those classifiers to assign a sequence of class labels to each input of the neural network. We then calculate the likelihood of each observed label sequence and show that sequences associated with adversarial examples are far less likely than those of normal examples. We demonstrate the efficiency of our proposed detector against the C&W attack using two image classification datasets (MNIST, CIFAR-10) achieving an AUC of 0.97 for the CIFAR-10 dataset. We further show how our detector can be easily augmented with previously suggested defense methods to form a holistic multi-purpose defense mechanism.
引用
收藏
页数:9
相关论文
共 35 条
  • [1] Nguyen A, 2015, PROC CVPR IEEE, P427, DOI 10.1109/CVPR.2015.7298640
  • [2] [Anonymous], 2016, ICLR
  • [3] [Anonymous], 2017, Agonistic Mourning: Political Dissidence and the Women in Black
  • [4] [Anonymous], 2017, ARXIV171010766
  • [5] [Anonymous], 2018, ARXIV180100553
  • [6] [Anonymous], 2016, TECHNICAL REPORT CLE
  • [7] [Anonymous], ARXIV180506605
  • [8] Asteri E., 2017, ARXIV171209196
  • [9] Athalye A., 2018, P 35 INT C MACH LEAR
  • [10] Bhagoji A.N., 2017, Dimensionality reduction as a defense against evasion attacks on machine learning classifiers