Categorical Inference Poisoning: Verifiable Defense Against Black-Box DNN Model Stealing Without Constraining Surrogate Data and Query Times

被引:9
作者
Zhang, Haitian [1 ]
Hua, Guang [1 ]
Wang, Xinya [1 ]
Jiang, Hao [1 ]
Yang, Wen [1 ]
机构
[1] Wuhan Univ, Sch Elect Informat, Wuhan 430072, Peoples R China
基金
中国国家自然科学基金;
关键词
Data models; Closed box; Watermarking; Degradation; Computational modeling; Deep learning; Training; Model stealing; surrogate attack; model extraction; DNN model protection; inference poisoning; OOD detection; backdoor watermarking;
D O I
10.1109/TIFS.2023.3244107
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Deep Neural Network (DNN) models have offered powerful solutions for a wide range of tasks, but the cost to develop such models is nontrivial, which calls for effective model protection. Although black-box distribution can mitigate some threats, model functionality can still be stolen via black-box surrogate attacks. Recent studies have shown that surrogate attacks can be launched in several ways, while the existing defense methods commonly assume attackers with insufficient in-distribution (ID) data and restricted attacking strategies. In this paper, we relax these constraints and assume a practical threat model in which the adversary not only has sufficient ID data and query times but also can adjust the surrogate training data labeled by the victim model. Then, we propose a two-step categorical inference poisoning (CIP) framework, featuring both poisoning for performance degradation (PPD) and poisoning for backdooring (PBD). In the first poisoning step, incoming queries are classified into ID and (out-of-distribution) OOD ones using an energy score (ES) based OOD detector, and the latter are further classified into high ES and low ES ones, which are subsequently passed to a strong and a weak PPD process, respectively. In the second poisoning step, difficult ID queries are detected by a proposed reliability score (RS) measurement and are passed to PBD. In doing so, the first step OOD poisoning leads to substantial performance degradation in surrogate models, the second step ID poisoning further embeds backdoors in them, while both can preserve model fidelity. Extensive experiments confirm that CIP can not only achieve promising performance against state-of-the-art black-box surrogate attacks like KnockoffNets and data-free model extraction (DFME) but also work well against stronger attacks with sufficient ID and deceptive data, better than the existing dynamic adversarial watermarking (DAWN) and deceptive perturbation defense methods.
引用
收藏
页码:1473 / 1486
页数:14
相关论文
共 55 条
  • [1] Adi Y, 2018, PROCEEDINGS OF THE 27TH USENIX SECURITY SYMPOSIUM, P1615
  • [2] Atli B. G., 2020, Engineering Dependable and Secure Machine Learning Systems, P42
  • [3] Barbalau A., 2020, ADV NEURAL INFORM PR, V33, P20120
  • [4] Bossard L, 2014, LECT NOTES COMPUT SC, V8694, P446, DOI 10.1007/978-3-319-10599-4_29
  • [5] IPGuard: Protecting Intellectual Property of Deep Neural Networks via Fingerprinting the Classification Boundary
    Cao, Xiaoyu
    Jia, Jinyuan
    Gong, Neil Zhenqiang
    [J]. ASIA CCS'21: PROCEEDINGS OF THE 2021 ACM ASIA CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2021, : 14 - 25
  • [6] Cecil R. R., 2019, Pharmaceutical Supply ChainsMedicines Shortages, P183
  • [7] REVERSE ENGINEERING AND DESIGN RECOVERY - A TAXONOMY
    CHIKOFSKY, EJ
    CROSS, JH
    [J]. IEEE SOFTWARE, 1990, 7 (01) : 13 - 17
  • [8] Correia-Silva JR, 2018, 2018 INT JOINT C NEU, P1, DOI [DOI 10.1109/IJCNN.2018.8489592, 10.1109/ijcnn.2018.8489592]
  • [9] Black-box Detection of Backdoor Attacks with Limited Information and Data
    Dong, Yinpeng
    Yang, Xiao
    Deng, Zhijie
    Pang, Tianyu
    Xiao, Zihao
    Su, Hang
    Zhu, Jun
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 16462 - 16471
  • [10] Fan LX, 2019, ADV NEUR IN, V32