BLOCK-SPARSE ADVERSARIAL ATTACK TO FOOL TRANSFORMER-BASED TEXT CLASSIFIERS

被引:5
作者
Sadrizadeh, Sahar [1 ]
Dolamic, Ljiljana [2 ]
Frossard, Pascal [1 ]
机构
[1] Ecole Polytech Fed Lausanne EPFL, Lausanne, Switzerland
[2] Armasuisse S T, Thun, Switzerland
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
关键词
Adversarial attack; block sparse; deep neural network; natural language processing; text classification;
D O I
10.1109/ICASSP43922.2022.9747475
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently, it has been shown that, in spite of the significant performance of deep neural networks in different fields, those are vulnerable to adversarial examples. In this paper, we propose a gradient-based adversarial attack against transformer-based text classifiers. The adversarial perturbation in our method is imposed to be block-sparse so that the resultant adversarial example differs from the original sentence in only a few words. Due to the discrete nature of textual data, we perform gradient projection to find the minimizer of our proposed optimization problem. Experimental results demonstrate that, while our adversarial attack maintains the semantics of the sentence, it can reduce the accuracy of GPT-2 to less than 5% on different datasets (AG News, MNLI, and Yelp Reviews). Furthermore, the block-sparsity constraint of the proposed optimization problem results in small perturbations in the adversarial example. (1)
引用
收藏
页码:7837 / 7841
页数:5
相关论文
共 21 条
[1]  
Alzantot M, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P2890
[2]  
Cer D, 2018, CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, P169
[3]  
Ebrahimi J, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, P31
[4]   Block-Sparse Recovery via Convex Optimization [J].
Elhamifar, Ehsan ;
Vidal, Rene .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2012, 60 (08) :4094-4107
[5]   Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers [J].
Gao, Ji ;
Lanchantin, Jack ;
Soffa, Mary Lou ;
Qi, Yanjun .
2018 IEEE SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (SPW 2018), 2018, :50-56
[6]  
Guo Chuan, 2021, ARXIV210413733
[7]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[8]  
Jin D, 2020, AAAI CONF ARTIF INTE, V34, P8018
[9]  
Madry A., 2017, Towards Deep Learning Models Resistant to Adversarial Attacks
[10]   Fine-grained Feature Alignment with Part Perspective Transformation for Vehicle ReID [J].
Meng, Dechao ;
Li, Liang ;
Wang, Shuhui ;
Gao, Xingyu ;
Zha, Zheng-Jun ;
Huang, Qingming .
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, :619-627