CBPF: A Novel Method for Filtering Poisoned Data Based on Composite Backdoor Attacks

被引:0
作者
Xia, Hanfeng [1 ]
Hong, Haibo [1 ]
Wang, Ruili [2 ]
Sun, Yiru [1 ]
Ding, Hao [1 ]
机构
[1] Zhejiang Gongshang Univ, Sch Comp Sci & Technol, Zhejiang Key Lab Big Data & Future Ecommerce Techn, Hangzhou 310018, Peoples R China
[2] Massey Univ, Sch Math & Computat Sci, Auckland 0632, New Zealand
基金
中国国家自然科学基金;
关键词
Backdoor attacks; backdoor defenses; composite backdoor poisoning filtering (CBPF); deep neural networks (DNNs); 1] J. Redmon; S; Divvala; R; Girshick; and A. Farhadi; You only look once; Unified; real-time object detection; in Proc. IEEE Conf. Comput. Vis. Pattern Recognit; 2016; pp. 779-788. [2] L. Brigato; B; Barz; L; Iocchi; and J. Denzler; Image classification with small datasets; Overview and benchmark; IEEE Access; vol; 10; pp; 49233-49250; 2022. [3] K. He; X; Zhang; Ren; and J. Sun; Deep residual learning for image recognition; pp. 770-778. [4] A. Krizhevsky; I; Sutskever; and G. E. Hinton; ImageNet classification with deep convolutional neural networks; in Proc. Adv. Neural Inf. Process. Syst; 25; 2012; pp. 1-9. [5] M. Cordts et al; The cityscapes dataset for semantic urban scene understanding; pp. 3213-3223. [6] T. Gu; K; Liu; Dolan-Gavitt; and S. Garg; BadNets; Evaluating backdooring attacks on deep neural networks; 47230-47244; 2019. [7] T. Wang; Y; Yao; F; Xu; M; An; and T. Wang; Inspecting prediction confidence for detecting black-box backdoor attacks; in Proc. AAAI Conf. Artif. Intell; 38; 2024; pp. 274-282. [8] S. Kolouri; A; Saha; H; Pirsiavash; and H. Hoffmann; Universal litmus patterns; Revealing backdoor attacks in CNNs; in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit; 2020; pp. 301-310. [9] S. Zheng; Wagner; Goswami; and C. Chen; Topological detection of trojaned neural networks; 34; 2021; pp. 17258-17272. [10] B. Chen et al; Detecting backdoor attacks on deep neural networks by activation clustering; in Proc. Workshop Artif. Intell. Saf. Co-Located 33rd AAAI Conf. Artif. Intell; 2019; pp. 1-10. [11] D. Tang; Wang; Tang; and K. Zhang; Demon in the variant; Statistical analysis of DNNs for robust backdoor contamination detec-tion; in Proc. 30th USENIX Security Symp. (USENIX Security); pp. 1541-1558. [12] Y. Gao; C; D; Chen; Ranasinghe; and S. Nepal; STRIP; A defence against trojan attacks on deep neural networks; Proc. 35th Annu. Comput. Security Appl. Conf; pp. 113-125. [13] B. Tran; J; Li; and A. Madry; Spectral signatures in backdoor attacks; 31; 2018; pp. 8011-8021. [14] J. Hayase; W; Kong; Somani; and S. Oh; Spectre; Defending against backdoor attacks using robust statistics; in Proc. Int. Conf. Mach. Learn; pp. 4129-4139. [15] K. Gao; Bai; Gu; Yang; and S.-T. Xia; Backdoor defense via adaptively splitting poisoned dataset; 2023; pp. 4005-4014. [16] Y. Li; Lyu; N; Koren; and X. Ma; Anti-backdoor learning; Training clean models on poisoned data; pp. 14900-14912. [17] Y. Li; Neural attention distillation; Erasing backdoor triggers from deep neural networks; Proc. 9th Int. Conf. Learn. Represent; pp. 1-19. [18] M. Liu; Sangiovanni-Vincentelli; and X. Yue; Beating backdoor attack at its own game; in Proc. IEEE/CVF Int. Conf. Comput. Vis; pp. 4620-4629. [19] X. Chen; Lu; and D. Song; Targeted back-door attacks on deep learning systems using data poisoning; 2017; arXiv; 1712.05526;
D O I
10.1109/JIOT.2025.3558627
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Backdoor attacks involve the injection of a limited quantity of poisoned samples containing triggers into the training dataset. During the inference stage, backdoor attacks can uphold a high level of accuracy for normal examples, yet when presented with trigger-containing instances, the model may erroneously predict them as the targeted class designated by the attacker. This article addresses the challenge of backdoor attacks by developing a novel method for filtering poisoned samples. We primarily leverage two key characteristics of backdoor attacks: 1) Multiple backdoors can exist simultaneously within a single model and 2) The discovery through composite backdoor attack (CBA) that altering two triggers in a sample to new target labels does not compromise the original functionality of the triggers, yet enables the prediction of the data as a new target class when both triggers are present simultaneously. Therefore, a novel three-stage poisoning data filtering approach, known as composite backdoor poisoning filtering (CBPF), is proposed as an effective solution. First, utilizing the identified distinctions in output between poisoned and clean samples, a subset of data is partitioned to include both poisoned and clean data. Subsequently, benign triggers are incorporated and labels are adjusted to create new target and benign target classes, thereby prompting the poisoned and clean data to be classified as distinct entities during the inference stage. The experimental results indicate that CBPF is successful in filtering out poisoned data produced by seven advanced attacks on CIFAR-10, GTSRB and ImageNet-12. On average, CBPF attains a notable filtering success rate of 99.88% for these attacks on CIFAR-10. Additionally, the model trained on the uncontaminated samples exhibits sustained high accuracy levels.
引用
收藏
页码:25136 / 25147
页数:12
相关论文
共 36 条
[1]  
Barni M, 2019, IEEE IMAGE PROC, P101, DOI [10.1109/icip.2019.8802997, 10.1109/ICIP.2019.8802997]
[2]   Image Classification With Small Datasets: Overview and Benchmark [J].
Brigato, Lorenzo ;
Barz, Bjoern ;
Iocchi, Luca ;
Denzler, Joachim .
IEEE ACCESS, 2022, 10 :49233-49250
[3]  
Chen B., 2019, P WORKSH ART INT SAF, P1
[4]  
Chen XY, 2017, Arxiv, DOI arXiv:1712.05526
[5]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[6]   Backdoor Defense via Adaptively Splitting Poisoned Dataset [J].
Gao, Kuofeng ;
Bai, Yang ;
Gu, Jindong ;
Yang, Yong ;
Xia, Shu-Tao .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :4005-4014
[7]   STRIP: A Defence Against Trojan Attacks on Deep Neural Networks [J].
Gao, Yansong ;
Xu, Change ;
Wang, Derui ;
Chen, Shiping ;
Ranasinghe, Damith C. ;
Nepal, Surya .
35TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE (ACSA), 2019, :113-125
[8]   BadNets: Evaluating Backdooring Attacks on Deep Neural Networks [J].
Gu, Tianyu ;
Liu, Kang ;
Dolan-Gavitt, Brendan ;
Garg, Siddharth .
IEEE ACCESS, 2019, 7 :47230-47244
[9]  
Guo J., 2023, P 11 INT C LEARN REP, P1
[10]  
Hayase J, 2021, PR MACH LEARN RES, V139