Between-Class Adversarial Training for Improving Adversarial Robustness of Image Classification

被引:1
作者
Wang, Desheng [1 ]
Jin, Weidong [1 ,2 ]
Wu, Yunpu [3 ]
机构
[1] Southwest Jiaotong Univ, Sch Elect Engn, Chengdu 611756, Peoples R China
[2] Nanning Univ, China ASEAN Int Joint Lab Integrated Transportat, Nanning 541699, Peoples R China
[3] Xihua Univ, Sch Elect Engn & Elect Informat, Chengdu 610039, Peoples R China
基金
中国国家自然科学基金;
关键词
adversarial training; between-class learning; robustness; regularization;
D O I
10.3390/s23063252
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Deep neural networks (DNNs) have been known to be vulnerable to adversarial attacks. Adversarial training (AT) is, so far, the only method that can guarantee the robustness of DNNs to adversarial attacks. However, the robustness generalization accuracy gain of AT is still far lower than the standard generalization accuracy of an undefended model, and there is known to be a trade-off between the standard generalization accuracy and the robustness generalization accuracy of an adversarially trained model. In order to improve the robustness generalization and the standard generalization performance trade-off of AT, we propose a novel defense algorithm called Between-Class Adversarial Training (BCAT) that combines Between-Class learning (BC-learning) with standard AT. Specifically, BCAT mixes two adversarial examples from different classes and uses the mixed between-class adversarial examples to train a model instead of original adversarial examples during AT. We further propose BCAT+ which adopts a more powerful mixing method. BCAT and BCAT+ impose effective regularization on the feature distribution of adversarial examples to enlarge between-class distance, thus improving the robustness generalization and the standard generalization performance of AT. The proposed algorithms do not introduce any hyperparameters into standard AT; therefore, the process of hyperparameters searching can be avoided. We evaluate the proposed algorithms under both white-box attacks and black-box attacks using a spectrum of perturbation values on CIFAR-10, CIFAR-100, and SVHN datasets. The research findings indicate that our algorithms achieve better global robustness generalization performance than the state-of-the-art adversarial defense methods.
引用
收藏
页数:23
相关论文
共 59 条
[1]  
[Anonymous], 2016, P IEEE CVPR
[2]  
Athalye A, 2018, PR MACH LEARN RES, V80
[3]  
Carlini N, 2017, Arxiv, DOI arXiv:1711.08478
[4]   Towards Evaluating the Robustness of Neural Networks [J].
Carlini, Nicholas ;
Wagner, David .
2017 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2017, :39-57
[5]  
Caruana R, 2001, ADV NEUR IN, V13, P402
[6]  
Chen K., 2020, P ICASSP 2020 2020 I
[7]  
Chen PY, 2018, AAAI CONF ARTIF INTE, P10
[8]  
Chen T., 2020, Advances in neural information processing systems, V33, P22243
[9]  
Dhillon G.S., 2018, INT C LEARN REPR
[10]   Robust Physical-World Attacks on Deep Learning Visual Classification [J].
Eykholt, Kevin ;
Evtimov, Ivan ;
Fernandes, Earlence ;
Li, Bo ;
Rahmati, Amir ;
Xiao, Chaowei ;
Prakash, Atul ;
Kohno, Tadayoshi ;
Song, Dawn .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1625-1634