Quarantine: Sparsity Can Uncover the Trojan Attack Trigger for Free

被引:8
作者
Chen, Tianlong [1 ]
Zhang, Zhenyu [1 ]
Zhang, Yihua [2 ]
Chang, Shiyu [3 ]
Liu, Sijia [2 ,4 ]
Wang, Zhangyang [1 ]
机构
[1] Univ Texas Austin, Austin, TX 78712 USA
[2] Michigan State Univ, E Lansing, MI 48824 USA
[3] Univ Calif Santa Barbara, Santa Barbara, CA 93106 USA
[4] MIT IBM Watson AI Lab, Cambridge, MA USA
来源
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) | 2022年
基金
美国国家科学基金会;
关键词
D O I
10.1109/CVPR52688.2022.00068
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Trojan attacks threaten deep neural networks (DNNs) by poisoning them to behave normally on most samples, yet to produce manipulated results for inputs attached with a particular trigger. Several works attempt to detect whether a given DNN has been injected with a specific trigger during the training. In a parallel line of research, the lottery ticket hypothesis reveals the existence of sparse subnetworks which are capable of reaching competitive performance as the dense network after independent training. Connecting these two dots, we investigate the problem of Trojan DNN detection from the brand new lens of sparsity, even when no clean training data is available. Our crucial observation is that the Trojan features are significantly more stable to network pruning than benign features. Leveraging that, we propose a novel Trojan network detection regime: first locating a "winning Trojan lottery ticket" which preserves nearly full Trojan information yet only chance-level performance on clean inputs; then recovering the trigger embedded in this already isolated subnetwork. Extensive experiments on various datasets, i.e., CIFAR-10, CIFAR-100, and ImageNet, with different network architectures, i.e., VGG-16, ResNet-18, ResNet-20s, and DenseNet-100 demonstrate the effectiveness of our proposal. Codes are available at https://github.com/VITA-Group/Backdoor-LTH.
引用
收藏
页码:588 / 599
页数:12
相关论文
共 95 条
  • [1] Biggio Battista, 2013, Machine Learning and Knowledge Discovery in Databases. European Conference, ECML PKDD 2013. Proceedings: LNCS 8190, P387, DOI 10.1007/978-3-642-40994-3_25
  • [2] Distributed optimization and statistical learning via the alternating direction method of multipliers
    Boyd S.
    Parikh N.
    Chu E.
    Peleato B.
    Eckstein J.
    [J]. Foundations and Trends in Machine Learning, 2010, 3 (01): : 1 - 122
  • [3] Towards Evaluating the Robustness of Neural Networks
    Carlini, Nicholas
    Wagner, David
    [J]. 2017 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2017, : 39 - 57
  • [4] Chen BY, 2018, Arxiv, DOI arXiv:1811.03728
  • [5] Chen HL, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P4658
  • [6] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
    Chen, Liang-Chieh
    Papandreou, George
    Kokkinos, Iasonas
    Murphy, Kevin
    Yuille, Alan L.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848
  • [7] Chen TL, 2021, Arxiv, DOI arXiv:2102.06790
  • [8] Chen TL, 2021, Arxiv, DOI arXiv:2012.06908
  • [9] Chen TL, 2021, Arxiv, DOI arXiv:2103.00397
  • [10] Chen Tianlong, 2020, LOTTERY TICKET HYPOT, V2