Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation

被引：103

作者：

Zhong, Haoti ^{[1
]}

Liao, Cong ^{[2
]}

Squicciarini, Anna Cinzia ^{[2
]}

Zhu, Sencun ^{[3
]}

Miller, David ^{[1
]}

机构：

[1] Penn State Univ, Elect Engn, University Pk, PA 16802 USA

[2] Penn State Univ, Informat Sci & Technol, University Pk, PA USA

[3] Penn State Univ, Comp Sci & Engn, University Pk, PA USA

来源：

PROCEEDINGS OF THE TENTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY, CODASPY 2020 | 2020年

关键词：

Adversarial Machine Learning; Perturbation Attacks; Deep Learning;

D O I：

10.1145/3374664.3375751

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep learning models have consistently outperformed traditional machine learning models in various classification tasks, including image classification. As such, they have become increasingly prevalent in many real world applications including those where security is of great concern. Such popularity, however, may attract attackers to exploit the vulnerabilities of the deployed deep learning models and launch attacks against security-sensitive applications. In this paper, we focus on a specific type of data poisoning attack, which we refer to as a backdoor injection attack. The main goal of the adversary performing such attack is to generate and inject a backdoor into a deep learning model that can be triggered to recognize certain embedded patterns with a target label of the attacker's choice. Additionally, a backdoor injection attack should occur in a stealthy manner, without undermining the efficacy of the victim model. Specifically, we propose two approaches for generating a backdoor that is hardly perceptible yet effective in poisoning the model. We consider two attack settings, with backdoor injection carried out either before model training or during model updating. We carry out extensive experimental evaluations under various assumptions on the adversary model, and demonstrate that such attacks can be effective and achieve a high attack success rate (above 90%) at a small cost of model accuracy loss (below 1%) with a small injection rate (around 1%), even under the weakest assumption wherein the adversary has no knowledge either of the original training data or the classifier model.

引用

页码：97 / 108

页数：12

共 49 条

[1] Convolutional Neural Networks for Speech Recognition [J].

Abdel-Hamid, Ossama ;

Mohamed, Abdel-Rahman ;

Jiang, Hui ;

Deng, Li ;

Penn, Gerald ;

Yu, Dong .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) :1533-1545

[2]

[Anonymous], 2012, P 29 INT C MACH LEAR

[3]

Bengio Y, 2011, P 2011 INT C UNS TRA, V27, P17

[4]

Biggio Battista, 2013, Machine Learning and Knowledge Discovery in Databases. European Conference, ECML PKDD 2013. Proceedings: LNCS 8190, P387, DOI 10.1007/978-3-642-40994-3_25

[5]

Bojarski M, 2016, Arxiv, DOI arXiv:1604.07316

[6]

Bottou Leon, 2012, Neural Networks: Tricks of the Trade. Second Edition: LNCS 7700, P421, DOI 10.1007/978-3-642-35289-8_25

[7]

Bradski G, 2000, DR DOBBS J, V25, P120

[8] Digital image steganography: Survey and analysis of current methods [J].

Cheddad, Abbas ;

Condell, Joan ;

Curran, Kevin ;

Mc Kevitt, Paul .

SIGNAL PROCESSING, 2010, 90 (03) :727-752

[9]

Chen XY, 2017, Arxiv, DOI arXiv:1712.05526

[10]

Cotter A., 2011, Advances in neural information processing systems, P1647

← 1 2 3 4 5 →