Knowledge-Driven Backdoor Removal in Deep Neural Networks via Reinforcement Learning

被引:0
作者
Song, Jiayin [1 ]
Li, Yike [1 ]
Tian, Yunzhe [1 ]
Wu, Xingyu [1 ]
Li, Qiong [1 ]
Tong, Endong [1 ,2 ]
Niu, Wenjia [1 ]
Zhang, Zhenguo [3 ]
Liu, Jiqiang [1 ]
机构
[1] Beijing Jiaotong Univ, Beijing Key Lab Secur & Privacy Intelligent Trans, Beijing 100044, Peoples R China
[2] Beijing Jiaotong Univ, Tangshan Res Inst, Tangshan 063000, Peoples R China
[3] Hebei Boshilin Technol Dev Co Ltd, Shijiazhuang, Hebei, Peoples R China
来源
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT III, KSEM 2024 | 2024年 / 14886卷
基金
中国国家自然科学基金;
关键词
Backdoor Removal; Reinforcement Learning; Neuron Activate; Backdoor Attack; Deep Learning;
D O I
10.1007/978-981-97-5498-4_26
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Backdoor attacks have become a major security threat to deep neural networks (DNNs), promoting significant studies in backdoor removal to mitigate these attacks. However, existing backdoor removal methods often work independently and struggle to generalize across various attacks, which limits their effectiveness when the specific methods used by attackers are unknown. To effectively defend against multiple backdoor attacks, in this paper, we propose the Reinforcement Learning-based Backdoor Removal (RLBR) framework, which integrates multiple defense strategies and dynamically switches various defense methods during the removal process. Driven by the knowledge we observed that a) neuron activation patterns vary significantly under different attacks, and b) these patterns dynamically change during the removal process, we take the neuron activation pattern of the poisoned models as the environment state in the RLBR framework. Besides, we evaluate the defense effectiveness as rewards to guide the selection of optimal defense strategy at each decision point. Through extensive experiments against six state-of-the-art backdoor attacks on two benchmark datasets, RLBR improved defensive performance by 6.91% while maintaining an accuracy of 92.63% on clean datasets, compared to seven baseline backdoor defense methods.
引用
收藏
页码:336 / 348
页数:13
相关论文
共 29 条
[1]  
Chen XY, 2017, Arxiv, DOI arXiv:1712.05526
[2]  
DeVries T, 2017, Arxiv, DOI arXiv:1708.04552
[3]  
Du M, 2019, Arxiv, DOI arXiv:1911.07116
[4]  
Gong CY, 2020, Arxiv, DOI arXiv:2002.09024
[5]   BadNets: Evaluating Backdooring Attacks on Deep Neural Networks [J].
Gu, Tianyu ;
Liu, Kang ;
Dolan-Gavitt, Brendan ;
Garg, Siddharth .
IEEE ACCESS, 2019, 7 :47230-47244
[6]  
Hayase J, 2021, PR MACH LEARN RES, V139
[7]   Deep Hybrid 2-D-3-D CNN Based on Dual Second-Order Attention With Camera Spectral Sensitivity Prior for Spectral Super-Resolution [J].
Li, Jiaojiao ;
Wu, Chaoxiong ;
Song, Rui ;
Li, Yunsong ;
Xie, Weiying ;
He, Lihuo ;
Gao, Xinbo .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (02) :623-634
[8]  
Li Y., 2021, arXiv
[9]  
Li YG, 2021, Arxiv, DOI arXiv:2101.05930
[10]   Backdoor Learning: A Survey [J].
Li, Yiming ;
Jiang, Yong ;
Li, Zhifeng ;
Xia, Shu-Tao .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (01) :5-22