Cassandra: Detecting Trojaned Networks From Adversarial Perturbations

被引:6
|
作者
Zhang, Xiaoyu [1 ]
Gupta, Rohit [2 ]
Mian, Ajmal [3 ]
Rahnavard, Nazanin [4 ]
Shah, Mubarak [2 ]
机构
[1] Univ Cent Florida, Dept Comp Sci, Orlando, FL 32816 USA
[2] Univ Cent Florida, Ctr Res Comp Vis, Orlando, FL 32816 USA
[3] Univ Western Australia, Dept Comp Sci & Software Engn, Perth, WA 6009, Australia
[4] Univ Cent Florida, Dept Elect Engn, Orlando, FL 32816 USA
来源
IEEE ACCESS | 2021年 / 9卷 / 09期
基金
澳大利亚研究理事会;
关键词
Trojan horses; Perturbation methods; Computational modeling; Training; Data models; Feature extraction; Detectors; Deep learning; adversarial attack; backdoor detection; computer vision;
D O I
10.1109/ACCESS.2021.3101289
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep neural networks are being widely deployed for critical tasks. In many cases, pre-trained models are sourced from vendors who may have disrupted the training pipeline to insert Trojan behaviors. These malicious behaviors can be triggered at the adversary's will, which is a serious security threat. To verify the integrity of a deep model, we propose a method that captures its fingerprint with adversarial perturbations. Inserting backdoors into a network alters its decision boundaries which are effectively encoded by adversarial perturbations. Our proposed Trojan detection network learns features from adversarial patterns and its properties to encode the unknown trigger shape and deviations in the decision boundaries caused by backdoors. Our method works completely without or with limited clean samples for improved performance. Our method also performs anomaly detection to identify the target class of a Trojaned network and is invariant to the trigger type, trigger size, network architecture and does not require any triggered samples. Experiments are performed on MNIST, NIST-TrojAI and Odysseus datasets, with 5000 pre-trained models in total, making this the largest study to date on Trojaned detection and the new state-of-the-art accuracy is achieved.
引用
收藏
页码:135856 / 135867
页数:12
相关论文
共 50 条
  • [1] Detecting Adversarial Perturbations with Salieny
    Zhang, Chiliang
    Yang, Zhimou
    Ye, Zuochang
    PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: IOT AND SMART CITY (ICIT 2018), 2018, : 25 - 30
  • [2] Detecting Adversarial Perturbations with Saliency
    Zhang, Chiliang
    Ye, Zuochang
    Wang, Yan
    Yang, Zhimou
    2018 IEEE 3RD INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP), 2018, : 271 - 275
  • [3] Detecting backdoor in deep neural networks via intentional adversarial perturbations
    Xue, Mingfu
    Wu, Yinghao
    Wu, Zhiyu
    Zhang, Yushu
    Wang, Jian
    Liu, Weiqiang
    INFORMATION SCIENCES, 2023, 634 : 564 - 577
  • [4] HYBRID DEFENSE FOR DEEP NEURAL NETWORKS: AN INTEGRATION OF DETECTING AND CLEANING ADVERSARIAL PERTURBATIONS
    Fan, Weiqi
    Sun, Guangling
    Su, Yuying
    Liu, Zhi
    Lu, Xiaofeng
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2019, : 210 - 215
  • [5] Detecting and Mitigating Adversarial Perturbations for Robust Face Recognition
    Gaurav Goswami
    Akshay Agarwal
    Nalini Ratha
    Richa Singh
    Mayank Vatsa
    International Journal of Computer Vision, 2019, 127 : 719 - 742
  • [6] Detecting and Mitigating Adversarial Perturbations for Robust Face Recognition
    Goswami, Gaurav
    Agarwal, Akshay
    Ratha, Nalini
    Sing, Richa
    Vatsa, Mayank
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2019, 127 (6-7) : 719 - 742
  • [7] Detecting Adversarial Perturbations in Multi-Task Perception
    Klingner, Marvin
    Kumar, Varun Ravi
    Yogamani, Senthil
    Baer, Andreas
    Fingscheidt, Tim
    2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 13050 - 13057
  • [8] Trustworthy adaptive adversarial perturbations in social networks
    Zhang, Jiawei
    Wang, Jinwei
    Wang, Hao
    Luo, Xiangyang
    Ma, Bin
    JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2024, 80
  • [9] Stereopagnosia: Fooling Stereo Networks with Adversarial Perturbations
    Wong, Alex
    Mundhra, Mukund
    Soatto, Stefano
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 2879 - 2888
  • [10] Detecting Adversarial Perturbations Through Spatial Behavior in Activation Spaces
    Katzir, Ziv
    Elovici, Yuval
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,