Combating Multi-level Adversarial Text with Pruning based Adversarial Training

被引：3

作者：

Ke, Jianpeng ^{[1
]}

Wang, Lina ^{[1
]}

Ye, Aoshuang ^{[1
]}

Fu, Jie ^{[1
]}

机构：

[1] Wuhan Univ, Key Lab Aerosp Informat Secur & Trusted Comp, Minist Educ, Sch Cyber Sci & Engn, Wuhan, Peoples R China

来源：

2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2022年

基金：

中国国家自然科学基金;

关键词：

Adversarial training; Adversarial text; Model pruning; Deep neural network;

D O I：

10.1109/IJCNN55064.2022.9892314

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Despite significant advancements of deep learning-based models for natural language processing (NLP) tasks, previous efforts have shown that numerous models, including deep neural networks (DNNs), suffer from moderate to significant performance degradation with adversarial examples. Adversary crafts malicious text by adding, deleting, modifying chars, words, and sentences, to fool the DNN models. Therefore, adversarial training and model enhanced methods are proposed to combat the adversarial attack. However, both methods are lack generalization due to the overfitting intrinsic of neural networks. In this paper, we propose a novel framework to combat text adversarial examples, namely DisPAT, which consists an adversarial text discriminator and a robust pruned text classifier. First, we explore the adversarial examples and benign examples distribution in embedding space, indicating the feasibility of a DNN-based discriminator. To get multi-level adversarial texts, we deploy a generator, and a discriminator to identify adversarial perturbations. Notably, in the inference stage, our pipeline places the well-trained discriminator in front of the text classifier to distinguish the char-level adversarial text. Finally, we apply neuron-salience-based pruning to specifically improve the classifier performance of adversarial text. Experimental results show that our approach outperforms state-of-the-art baselines in combating both char-level and word-level adversarial text. Moreover, DisPAT achieves a very close to or even higher accuracy than that of the standard model.

引用

页数：8

共 50 条

[1] Combating Word-level Adversarial Text with Robust Adversarial Training
Du, Xiaohu
Yu, Jie
Li, Shasha
Yi, Zibo
Liu, Hai
Ma, Jun
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[2] Generating Long and Coherent Text with Multi-Level Generative Adversarial Networks
Tang, Tianyi
Li, Junyi
Zhao, Wayne Xin
Wen, Ji-Rong
WEB AND BIG DATA, APWEB-WAIM 2021, PT II, 2021, 12859 : 49 - 63
[3] Joint Character-Level Word Embedding and Adversarial Stability Training to Defend Adversarial Text
Liu, Hui
Zhang, Yongzheng
Wang, Yipeng
Lin, Zheng
Chen, Yige
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8384 - 8391
[4] Wavelet-based multi-level generative adversarial networks for face aging
Shao, Jun
Bui, Tien D.
COMPUTER VISION AND IMAGE UNDERSTANDING, 2022, 223
[5] Multi-level adversarial attention cross-modal hashing
Wang, Benhui
Zhang, Huaxiang
Zhu, Lei
Nie, Liqiang
Liu, Li
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 117
[6] Multi-level adversarial network for domain adaptive semantic segmentation
Huang, Jiaxing
Guan, Dayan
Xiao, Aoran
Lu, Shijian
PATTERN RECOGNITION, 2022, 123
[7] ADVERSARIAL ATTACKS ON MULTI-LEVEL FAULT DETECTION AND DIAGNOSIS SYSTEMS
Awad, Akram S.
Alkhouri, Ismail R.
Atia, George K.
2021 IEEE 31ST INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2021,
[8] Margin Discrepancy-Based Adversarial Training for Multi-Domain Text Classification
Wu, Yuan
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 170 - 182
[9] Improvements to adversarial training for text classification
He, Jia-Long
Zhang, Xiao-Lin
Wang, Yong-Ping
Gu, Rui-Chun
Liu, Li-Xin
Xu, En-Hui
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (02) : 5191 - 5202
[10] An adversarial training method for text classification
Liu, Xiaoyang
Dai, Shanghong
Fiumara, Giacomo
De Meo, Pasquale
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (08)

← 1 2 3 4 5 →