Adversarial Feature Alignment: Balancing Robustness and Accuracy in Deep Learning via Adversarial Training

被引：1

作者：

Park, Leo Hyun ^{[1
]}

Kim, Jaeuk ^{[1
]}

Oh, Myung Gyo ^{[1
]}

Park, Jaewoo ^{[1
]}

Kwon, Taekyoung ^{[1
]}

机构：

[1] Yonsei Univ, Seoul, South Korea

来源：

PROCEEDINGS OF THE 2024 WORKSHOP ON ARTIFICIAL INTELLIGENCE AND SECURITY, AISEC 2024 | 2024年

关键词：

deep learning; adversarial robustness; robustness-accuracy tradeoff; adversarial attack; adversarial training;

D O I：

10.1145/3689932.3694765

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep learning models are continually improving in accuracy, but they remain vulnerable to adversarial attacks, often resulting in the misclassification of adversarial examples. Adversarial training can mitigate this problem by enhancing the model's robust accuracy on adversarial examples. However, such training typically compromises the model's standard accuracy on clean samples. The necessity for deep learning models to balance both robustness and accuracy for security is evident, but achieving this balance remains challenging, and the underlying reasons are yet to be clarified. This paper proposes an innovative pre-training method called Adversarial Feature Alignment (AFA) to address these problems. Our approach involves identifying the trade-off in the model's feature space and fine-tuning it to achieve accuracy on both standard and adversarial examples concurrently. Our research unveils an intriguing insight: misalignment within the feature space often leads to misclassification, regardless of whether the samples are benign or adversarial. AFA mitigates this risk by employing a novel optimization algorithm based on contrastive learning to alleviate potential feature misalignment. Through our evaluations, we demonstrate the superior performance of AFA. Our method delivers state-of-the-art robust accuracy while minimizing the drop in clean accuracy to 1.86% and 8.91% on CIFAR10 and CIFAR100, respectively, compared to cross-entropy. We also show that joint optimization of AFA and TRADES, accompanied by data augmentation using a recent diffusion model, achieves state-of-the-art accuracy and robustness. Through AFA, we expect to enhance security while preserving accuracy in deep learning models through adversarial training.

引用

页码：101 / 112

页数：12

共 65 条

[1]

Addepalli Sravanti, 2022, Advances in Neural Information Processing Systems, V35

[2]

[Anonymous], 2020, Advances in Neural Information Processing Systems

[3]

Bai T, 2021, Arxiv, DOI arXiv:2102.01356

[4]

Brendel W., 2018, ICLR, P1

[5]

Bui A, 2021, Arxiv, DOI arXiv:2101.10027

[6] Towards Evaluating the Robustness of Neural Networks [J].

Carlini, Nicholas ;

Wagner, David .

2017 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2017, :39-57

[7] HopSkipJumpAttack: A Query-Efficient Decision-Based Attack [J].

Chen, Jianbo ;

Jordan, Michael, I ;

Wainwright, Martin J. .

2020 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP 2020), 2020, :1277-1294

[8]

Chen Lin, 2020, PMLR

[9]

Chen Ting, 2020, PMLR

[10]

Cohen J, 2019, PR MACH LEARN RES, V97

← 1 2 3 4 5 6 7 →