Robust and Verifiable Information Embedding Attacks to Deep Neural Networks via Error-Correcting Codes

被引:8
作者
Jia, Jinyuan [1 ]
Wang, Binghui [1 ]
Gong, Neil Zhenqiang [1 ]
机构
[1] Duke Univ, Durham, NC 27706 USA
来源
ASIA CCS'21: PROCEEDINGS OF THE 2021 ACM ASIA CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY | 2021年
关键词
Information Embedding Attacks; error-correcting code; machine learning security;
D O I
10.1145/3433210.3437519
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the era of deep learning, a user often leverages a third-party machine learning tool to train a deep neural network (DNN) classifier and then deploys the classifier as an end-user software product (e.g., a mobile app) or a cloud service. In an information embedding attack, an attacker is the provider of a malicious third-party machine learning tool. The attacker embeds a message into the DNN classifier during training and recovers the message via querying the API of the black-box classifier after the user deploys it. Information embedding attacks have attracted growing attention because of various applications such as watermarking DNN classifiers and compromising user privacy. State-of-the-art information embedding attacks have two key limitations: 1) they cannot verify the correctness of the recovered message, and 2) they are not robust against post-processing (e.g., compression) of the classifier. In this work, we aim to design information embedding attacks that are verifiable and robust against popular post-processing methods. Specifically, we leverage Cyclic Redundancy Check to verify the correctness of the recovered message. Moreover, to be robust against post-processing, we leverage Turbo codes, a type of error-correcting codes, to encode the message before embedding it to the DNN classifier. In order to save queries to the deployed classifier, we propose to recover the message via adaptively querying the classifier. Our adaptive recovery strategy leverages the property of Turbo codes that supports error correcting with a partial code. We evaluate our information embedding attacks using simulated messages and apply them to three applications (i.e., training data inference, property inference, DNN architecture inference), where messages have semantic interpretations. We consider 8 popular methods to post-process the classifier. Our results show that our attacks can accurately and verifiably recover the messages in all considered scenarios, while state-of-the-art attacks cannot accurately recover the messages in many scenarios.
引用
收藏
页码:2 / 13
页数:12
相关论文
共 57 条
  • [1] Abadi Martin, 2016, arXiv
  • [2] Adi Y, 2018, PROCEEDINGS OF THE 27TH USENIX SECURITY SYMPOSIUM, P1615
  • [3] [Anonymous], 2019, GOOGLE AI PLATFORM
  • [4] [Anonymous], 2019, IBM WATSON MACHINE L
  • [5] [Anonymous], BIGLEARN NIPS WORKSH
  • [6] Asuncion A., 2007, UCI Machine Learning Repository
  • [7] Ateniese Giuseppe, 2015, International Journal of Security and Networks, V10, P137
  • [8] BERROU C, 1993, IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS 93 : TECHNICAL PROGRAM, CONFERENCE RECORD, VOLS 1-3, P1064, DOI 10.1109/ICC.1993.397441
  • [9] Biggio B., 2012, INT C MACHINE LEARNI
  • [10] Biggio B, 2013, INT CONF BIOMETR