Towards Interpretable Defense Against Adversarial Attacks via Causal Inference

被引:0
|
作者
Min Ren [1 ,2 ]
Yun-Long Wang [2 ]
Zhao-Feng He [3 ]
机构
[1] University of Chinese Academy of Sciences
[2] Center for Research on Intelligent Perception and Computing, National Laboratory of Pattern Recognition,Institute of Automation, Chinese Academy of Sciences
[3] Laboratory of Visual Computing and Intelligent System, Beijing University of Posts and Telecommunications
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论]; TP309 [安全保密];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ; 081201 ; 0839 ; 1402 ;
摘要
Deep learning-based models are vulnerable to adversarial attacks. Defense against adversarial attacks is essential for sensitive and safety-critical scenarios. However, deep learning methods still lack effective and efficient defense mechanisms against adversarial attacks. Most of the existing methods are just stopgaps for specific adversarial samples. The main obstacle is that how adversarial samples fool the deep learning models is still unclear. The underlying working mechanism of adversarial samples has not been well explored, and it is the bottleneck of adversarial attack defense. In this paper, we build a causal model to interpret the generation and performance of adversarial samples. The self-attention/transformer is adopted as a powerful tool in this causal model. Compared to existing methods, causality enables us to analyze adversarial samples more naturally and intrinsically. Based on this causal model, the working mechanism of adversarial samples is revealed, and instructive analysis is provided. Then, we propose simple and effective adversarial sample detection and recognition methods according to the revealed working mechanism. The causal insights enable us to detect and recognize adversarial samples without any extra model or training. Extensive experiments are conducted to demonstrate the effectiveness of the proposed methods. Our methods outperform the state-of-the-art defense methods under various adversarial attacks.
引用
收藏
页码:209 / 226
页数:18
相关论文
共 50 条
  • [1] Towards Interpretable Defense Against Adversarial Attacks via Causal Inference
    Min Ren
    Yun-Long Wang
    Zhao-Feng He
    Machine Intelligence Research, 2022, 19 : 209 - 226
  • [2] Towards Interpretable Defense Against Adversarial Attacks via Causal Inference
    Ren, Min
    Wang, Yun-Long
    He, Zhao-Feng
    MACHINE INTELLIGENCE RESEARCH, 2022, 19 (03) : 209 - 226
  • [3] AttriGuard: A Practical Defense Against Attribute Inference Attacks via Adversarial Machine Learning
    Jia, Jinyuan
    Gong, Neil Zhenqiang
    PROCEEDINGS OF THE 27TH USENIX SECURITY SYMPOSIUM, 2018, : 513 - 529
  • [4] Deblurring as a Defense against Adversarial Attacks
    Duckworth, William, III
    Liao, Weixian
    Yu, Wei
    2023 IEEE 12TH INTERNATIONAL CONFERENCE ON CLOUD NETWORKING, CLOUDNET, 2023, : 61 - 67
  • [5] Towards Defense Against Adversarial Attacks on Graph Neural Networks via Calibrated Co-Training
    Wu, Xu-Gang
    Wu, Hui-Jun
    Zhou, Xu
    Zhao, Xiang
    Lu, Kai
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2022, 37 (05) : 1161 - 1175
  • [6] Towards Defense Against Adversarial Attacks on Graph Neural Networks via Calibrated Co-Training
    Xu-Gang Wu
    Hui-Jun Wu
    Xu Zhou
    Xiang Zhao
    Kai Lu
    Journal of Computer Science and Technology, 2022, 37 : 1161 - 1175
  • [7] Text Adversarial Purification as Defense against Adversarial Attacks
    Li, Linyang
    Song, Demin
    Qiu, Xipeng
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 338 - 350
  • [8] Defense against Adversarial Attacks with an Induced Class
    Xu, Zhi
    Wang, Jun
    Pu, Jian
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [9] On the Defense of Spoofing Countermeasures Against Adversarial Attacks
    Nguyen-Vu, Long
    Doan, Thien-Phuc
    Bui, Mai
    Hong, Kihun
    Jung, Souhwan
    IEEE ACCESS, 2023, 11 : 94563 - 94574
  • [10] A Defense Method Against Facial Adversarial Attacks
    Sadu, Chiranjeevi
    Das, Pradip K.
    2021 IEEE REGION 10 CONFERENCE (TENCON 2021), 2021, : 459 - 463