A Textual Backdoor Defense Method Based on Deep Feature Classification

被引:2
作者
Shao, Kun [1 ]
Yang, Junan [1 ]
Hu, Pengjiang [1 ]
Li, Xiaoshuai [1 ]
机构
[1] Natl Univ Def Technol, Coll Elect Engn, Hefei 230037, Peoples R China
关键词
deep neural networks; natural language processing; adversarial machine learning; backdoor attacks; backdoor defenses; ATTACKS;
D O I
10.3390/e25020220
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Natural language processing (NLP) models based on deep neural networks (DNNs) are vulnerable to backdoor attacks. Existing backdoor defense methods have limited effectiveness and coverage scenarios. We propose a textual backdoor defense method based on deep feature classification. The method includes deep feature extraction and classifier construction. The method exploits the distinguishability of deep features of poisoned data and benign data. Backdoor defense is implemented in both offline and online scenarios. We conducted defense experiments on two datasets and two models for a variety of backdoor attacks. The experimental results demonstrate the effectiveness of this defense approach and outperform the baseline defense method.
引用
收藏
页数:13
相关论文
共 50 条
[31]   Effective defense against physically embedded backdoor attacks via clustering-based filtering [J].
Kutbi, Mohammed .
COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (06)
[32]   Latent Space-Based Backdoor Attacks Against Deep Neural Networks [J].
Kristanto, Adrian ;
Wang, Shuo ;
Rudolph, Carsten .
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[33]   Backdoor attacks against deep reinforcement learning based traffic signal control systems [J].
Heng Zhang ;
Jun Gu ;
Zhikun Zhang ;
Linkang Du ;
Yongmin Zhang ;
Yan Ren ;
Jian Zhang ;
Hongran Li .
Peer-to-Peer Networking and Applications, 2023, 16 :466-474
[34]   Deep Neural Network Based Hyperspectral Pixel Classification With Factorized Spectral-Spatial Feature Representation [J].
Chen, Jingzhou ;
Chen, Siyu ;
Zhou, Peilin ;
Qian, Yuntao .
IEEE ACCESS, 2019, 7 :81407-81418
[35]   Backdoor attacks against deep reinforcement learning based traffic signal control systems [J].
Zhang, Heng ;
Gu, Jun ;
Zhang, Zhikun ;
Du, Linkang ;
Zhang, Yongmin ;
Ren, Yan ;
Zhang, Jian ;
Li, Hongran .
PEER-TO-PEER NETWORKING AND APPLICATIONS, 2023, 16 (01) :466-474
[36]   A novel method for feature learning and network intrusion classification [J].
Alzahrani, Ahmed S. ;
Shah, Reehan Ali ;
Qian, Yuntao ;
Ali, Munwar .
ALEXANDRIA ENGINEERING JOURNAL, 2020, 59 (03) :1159-1169
[37]   Defense against adversarial attacks via textual embeddings based on semantic associative field [J].
Huang, Jiacheng ;
Chen, Long .
NEURAL COMPUTING & APPLICATIONS, 2024, 36 (01) :289-301
[38]   Feature Extraction based Text Classification: A review [J].
Shaker, Saif Safaa ;
Alhajim, Dhafer ;
Al-Khazaali, Ahmed Ali Talib ;
Hussein, Hussein Aqeel ;
Athab, Ali F. .
JOURNAL OF ALGEBRAIC STATISTICS, 2022, 13 (01) :646-653
[39]   Defense Strategy of Network Security based on Dynamic Classification [J].
Wei, Jinxia ;
Zhang, Ru ;
Liu, Jianyi ;
Niu, Xinxin ;
Yang, Yixian .
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2015, 9 (12) :5116-5134
[40]   Defense against adversarial attacks via textual embeddings based on semantic associative field [J].
Jiacheng Huang ;
Long Chen .
Neural Computing and Applications, 2024, 36 :289-301