Knowledge-Driven Backdoor Removal in Deep Neural Networks via Reinforcement Learning

被引:0
作者
Song, Jiayin [1 ]
Li, Yike [1 ]
Tian, Yunzhe [1 ]
Wu, Xingyu [1 ]
Li, Qiong [1 ]
Tong, Endong [1 ,2 ]
Niu, Wenjia [1 ]
Zhang, Zhenguo [3 ]
Liu, Jiqiang [1 ]
机构
[1] Beijing Jiaotong Univ, Beijing Key Lab Secur & Privacy Intelligent Trans, Beijing 100044, Peoples R China
[2] Beijing Jiaotong Univ, Tangshan Res Inst, Tangshan 063000, Peoples R China
[3] Hebei Boshilin Technol Dev Co Ltd, Shijiazhuang, Hebei, Peoples R China
来源
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT III, KSEM 2024 | 2024年 / 14886卷
基金
中国国家自然科学基金;
关键词
Backdoor Removal; Reinforcement Learning; Neuron Activate; Backdoor Attack; Deep Learning;
D O I
10.1007/978-981-97-5498-4_26
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Backdoor attacks have become a major security threat to deep neural networks (DNNs), promoting significant studies in backdoor removal to mitigate these attacks. However, existing backdoor removal methods often work independently and struggle to generalize across various attacks, which limits their effectiveness when the specific methods used by attackers are unknown. To effectively defend against multiple backdoor attacks, in this paper, we propose the Reinforcement Learning-based Backdoor Removal (RLBR) framework, which integrates multiple defense strategies and dynamically switches various defense methods during the removal process. Driven by the knowledge we observed that a) neuron activation patterns vary significantly under different attacks, and b) these patterns dynamically change during the removal process, we take the neuron activation pattern of the poisoned models as the environment state in the RLBR framework. Besides, we evaluate the defense effectiveness as rewards to guide the selection of optimal defense strategy at each decision point. Through extensive experiments against six state-of-the-art backdoor attacks on two benchmark datasets, RLBR improved defensive performance by 6.91% while maintaining an accuracy of 92.63% on clean datasets, compared to seven baseline backdoor defense methods.
引用
收藏
页码:336 / 348
页数:13
相关论文
共 29 条
[21]  
Tran B, 2018, ADV NEUR IN, V31
[22]  
Turner A, 2019, Arxiv, DOI arXiv:1912.02771
[23]  
Wang BH, 2020, Arxiv, DOI arXiv:2002.11750
[24]   Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks [J].
Wang, Bolun ;
Yao, Yuanshun ;
Shan, Shawn ;
Li, Huiying ;
Viswanath, Bimal ;
Zheng, Haitao ;
Zhao, Ben Y. .
2019 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP 2019), 2019, :707-723
[25]   BppAttack: Stealthy and Efficient Trojan Attacks against Deep Neural Networks via Image Quantization and Contrastive Adversarial Learning [J].
Wang, Zhenting ;
Zhai, Juan ;
Ma, Shiqing .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :15054-15063
[26]   RAB: Provable Robustness Against Backdoor Attacks [J].
Weber, Maurice ;
Xu, Xiaojun ;
Karlas, Bojan ;
Zhang, Ce ;
Li, Bo .
2023 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP, 2023, :1311-1328
[27]  
Wu DX, 2021, ADV NEUR IN, V34
[28]   One-to-N & N-to-One: Two Advanced Backdoor Attacks Against Deep Learning Models [J].
Xue, Mingfu ;
He, Can ;
Wang, Jian ;
Liu, Weiqiang .
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2022, 19 (03) :1562-1578
[29]   Disabling Backdoor and Identifying Poison Data by using Knowledge Distillation in Backdoor Attacks on Deep Neural Networks [J].
Yoshida, Kota ;
Fujino, Takeshi .
PROCEEDINGS OF THE 13TH ACM WORKSHOP ON ARTIFICIAL INTELLIGENCE AND SECURITY, AISEC 2020, 2020, :117-127