DifFilter: Defending Against Adversarial Perturbations With Diffusion Filter

被引:3
作者
Chen, Yong [1 ,2 ]
Li, Xuedong [3 ,4 ]
Hu, Peng [5 ]
Peng, Dezhong [5 ,6 ]
Wang, Xu [5 ]
机构
[1] Natl Lab Adapt Opt, Chengdu 610209, Peoples R China
[2] Chinese Acad Sci, Inst Opt & Elect, Chengdu 610209, Peoples R China
[3] Chengdu Univ Informat Technol, Coll Blockchain Ind, Chengdu 610103, Peoples R China
[4] Adv Cryptog & Syst Secur Key Lab Sichuan Prov, Chengdu 610225, Peoples R China
[5] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[6] Sichuan Newstrong UHD Video Technol Co Ltd, Chengdu 610095, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Training; Purification; Perturbation methods; Robustness; Diffusion models; Mathematical models; Stochastic processes; Adversarial defence; adversarial purification; diffusion model; robustness;
D O I
10.1109/TIFS.2024.3422923
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The inherent vulnerability of deep learning to adversarial examples poses a significant security challenge. Although existing defense methods have partially mitigated the harm caused by adversarial attacks, they are still unable to meet practical needs due to their high cost, high latency, and poor defense performance. In this paper, we propose an advanced plug-and-play adversarial purification model called DifFilter. Specifically, we use the superior generative properties of diffusion models to denoise adversarial perturbations and recover clean images. To make Gaussian noise disrupt adversarial perturbations while preserving the real semantic information in the input image, we extend forward diffusion to an infinite number of noise scales so that the distribution of perturbation data evolves with increasing noise according to stochastic differential equations. In the inverse denoising process, we develop a score-based model learning method to restore the input prior distribution to the data distribution of the original clean sample, resulting in stronger purification effects. Additionally, we propose an efficient sampling method to accelerate the computation speed of inverse process, greatly reducing the time cost of purification. We conduct extensive experiments to evaluate the defense generalization performance of DifFilter. The results demonstrate that our method not only surpasses existing defense methods in defense robustness under strong adaptive and black-box attacks but also achieves higher certificate accuracy than the baseline. Furthermore, DifFilter can be combined with adversarial training to further improve defense robustness.
引用
收藏
页码:6779 / 6794
页数:16
相关论文
共 53 条
[1]   Defense against Universal Adversarial Perturbations [J].
Akhtar, Naveed ;
Liu, Jian ;
Mian, Ajmal .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3389-3398
[2]   Adversarial example detection for DNN models: a review and experimental comparison [J].
Aldahdooh, Ahmed ;
Hamidouche, Wassim ;
Fezza, Sid Ahmed ;
Deforges, Olivier .
ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (06) :4403-4462
[3]  
Anderson Brian DO., 1982, Stochastic Process. Appl., V12, P313, DOI [DOI 10.1016/0304-4149(82)90051-5, 10.1016/0304-4149(82)90051-5]
[4]  
Athalye A, 2018, PR MACH LEARN RES, V80
[5]  
Athalye A, 2018, PR MACH LEARN RES, V80
[6]  
Buesing L., 2016, P NEURIPS WORKSH ADV
[7]   Android HIV: A Study of Repackaging Malware for Evading Machine-Learning Detection [J].
Chen, Xiao ;
Li, Chaoran ;
Wang, Derui ;
Wen, Sheng ;
Zhang, Jun ;
Nepal, Surya ;
Xiang, Yang ;
Ren, Kui .
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2020, 15 :987-1001
[8]  
Chen Z., 2022, P WORKSH TRUSTW SOC, P1
[9]  
Cohen J, 2019, PR MACH LEARN RES, V97
[10]  
Croce F, 2020, PR MACH LEARN RES, V119