Adversarial Robustness on In- and Out-Distribution Improves Explainability

被引:27
作者
Augustin, Maximilian [1 ]
Meinke, Alexander [1 ]
Hein, Matthias [1 ]
机构
[1] Univ Tubingen, Tubingen, Germany
来源
COMPUTER VISION - ECCV 2020, PT XXVI | 2020年 / 12371卷
关键词
D O I
10.1007/978-3-030-58574-7_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neural networks have led to major improvements in image classification but suffer from being non-robust to adversarial changes, unreliable uncertainty estimates on out-distribution samples and their inscrutable black-box decisions. In this work we propose RATIO, a training procedure for Robustness via Adversarial Training on In- and Out-distribution, which leads to robust models with reliable and robust confidence estimates on the out-distribution. RATIO has similar generative properties to adversarial training so that visual counterfactuals produce class specific features. While adversarial training comes at the price of lower clean accuracy, RATIO achieves state-of-the-art l(2)-adversarial robustness on CIFAR10 and maintains better clean accuracy.
引用
收藏
页码:228 / 245
页数:18
相关论文
共 60 条
[1]   Square Attack: A Query-Efficient Black-Box Adversarial Attack via Random Search [J].
Andriushchenko, Maksym ;
Croce, Francesco ;
Flammarion, Nicolas ;
Hein, Matthias .
COMPUTER VISION - ECCV 2020, PT XXIII, 2020, 12368 :484-501
[2]  
Nguyen A, 2015, PROC CVPR IEEE, P427, DOI 10.1109/CVPR.2015.7298640
[3]  
Athalye A, 2018, PR MACH LEARN RES, V80
[4]   On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation [J].
Bach, Sebastian ;
Binder, Alexander ;
Montavon, Gregoire ;
Klauschen, Frederick ;
Mueller, Klaus-Robert ;
Samek, Wojciech .
PLOS ONE, 2015, 10 (07)
[5]  
Baehrens D, 2010, J MACH LEARN RES, V11, P1803
[6]   The Hidden Assumptions Behind Counterfactual Explanations and Principal Reasons [J].
Barocas, Solon ;
Selbst, Andrew D. ;
Raghavan, Manish .
FAT* '20: PROCEEDINGS OF THE 2020 CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, 2020, :80-89
[7]  
Bitterwolf J, 2021, Arxiv, DOI arXiv:2007.08473
[8]  
Carlini N, 2017, PROCEEDINGS OF THE 10TH ACM WORKSHOP ON ARTIFICIAL INTELLIGENCE AND SECURITY, AISEC 2017, P3, DOI 10.1145/3128572.3140444
[9]  
Carmon Y, 2019, ADV NEUR IN, V32
[10]  
Chang C. -H., 2019, INT C LEARN REPR