AdvEdge: Optimizing Adversarial Perturbations Against Interpretable Deep Learning

被引:8
作者
Abdukhamidov, Eldor [1 ]
Abuhamad, Mohammed [2 ]
Juraev, Firuz [1 ]
Chan-Tin, Eric [2 ]
AbuHmed, Tamer [1 ]
机构
[1] Sungkyunkwan Univ, Seoul, South Korea
[2] Loyola Univ Chicago, Chicago, IL 60660 USA
来源
COMPUTATIONAL DATA AND SOCIAL NETWORKS, CSONET 2021 | 2021年 / 13116卷
基金
新加坡国家研究基金会;
关键词
Adversarial image; Deep learning; Interpretability;
D O I
10.1007/978-3-030-91434-9_9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep Neural Networks (DNNs) have achieved state-of-theart performance in various applications. It is crucial to verify that the high accuracy prediction for a given task is derived from the correct problem representation and not from the misuse of artifacts in the data. Hence, interpretation models have become a key ingredient in developing deep learning models. Utilizing interpretation models enables a better understanding of how DNN models work, and offers a sense of security. However, interpretations are also vulnerable to malicious manipulation. We present AdvEdge and AdvEdge+, two attacks to mislead the target DNNs and deceive their combined interpretation models. We evaluate the proposed attacks against two DNN model architectures coupled with four representatives of different categories of interpretation models. The experimental results demonstrate our attacks' effectiveness in deceiving the DNN models and their interpreters.
引用
收藏
页码:93 / 105
页数:13
相关论文
共 18 条
  • [1] Dabkowski P., 2017, arXiv
  • [2] Interpretable Explanations of Black Boxes by Meaningful Perturbation
    Fong, Ruth C.
    Vedaldi, Andrea
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3449 - 3457
  • [3] A Survey of Methods for Explaining Black Box Models
    Guidotti, Riccardo
    Monreale, Anna
    Ruggieri, Salvatore
    Turin, Franco
    Giannotti, Fosca
    Pedreschi, Dino
    [J]. ACM COMPUTING SURVEYS, 2019, 51 (05)
  • [4] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [5] Towards Security Threats of Deep Learning Systems: A Survey
    He, Yingzhe
    Meng, Guozhu
    Chen, Kai
    Hu, Xingbo
    He, Jinwen
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (05) : 1743 - 1770
  • [6] Densely Connected Convolutional Networks
    Huang, Gao
    Liu, Zhuang
    van der Maaten, Laurens
    Weinberger, Kilian Q.
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 2261 - 2269
  • [7] Kindermans P.J., 2019, Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, P267
  • [8] Laidlaw C, 2019, ADV NEUR IN, V32
  • [9] Madry A, 2019, Arxiv, DOI arXiv:1706.06083
  • [10] Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
    Papernot, Nicolas
    McDaniel, Patrick
    Wu, Xi
    Jha, Somesh
    Swami, Ananthram
    [J]. 2016 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2016, : 582 - 597