On the Robustness of Bayesian Neural Networks to Adversarial Attacks

被引:1
|
作者
Bortolussi, Luca [1 ]
Carbone, Ginevra [2 ]
Laurenti, Luca [3 ]
Patane, Andrea [4 ]
Sanguinetti, Guido [5 ,6 ]
Wicker, Matthew [7 ]
机构
[1] Univ Trieste, Dept Math Informat & Geosci, Trieste, Italy
[2] Univ Trieste, Dept Math & Geosci, I-34128 Trieste, Italy
[3] TU Delft Univ, Delft Ctr Syst & Control, NL-2628 CN Delft, Netherlands
[4] Trinity Coll Dublin, Sch Comp Sci & Stat, Dublin D02 PN40, Ireland
[5] Scuola Int Super Studi Avanzati, SISSA, I-34136 Trieste, Italy
[6] Univ Edinburgh, Sch Informat, Edinburgh EH8 9AB, Scotland
[7] Univ Oxford, Dept Comp Sci, Oxford OX1 3QG, England
关键词
Training; Adversarial attacks; adversarial robustness; Bayesian inference; Bayesian neural networks (BNNs); GAUSSIAN PROCESS;
D O I
10.1109/TNNLS.2024.3386642
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vulnerability to adversarial attacks is one of the principal hurdles to the adoption of deep learning in safety-critical applications. Despite significant efforts, both practical and theoretical, training deep learning models robust to adversarial attacks is still an open problem. In this article, we analyse the geometry of adversarial attacks in the over-parameterized limit for Bayesian neural networks (BNNs). We show that, in the limit, vulnerability to gradient-based attacks arises as a result of degeneracy in the data distribution, i.e., when the data lie on a lower dimensional submanifold of the ambient space. As a direct consequence, we demonstrate that in this limit, BNN posteriors are robust to gradient-based adversarial attacks. Crucially, by relying on the convergence of infinitely-wide BNNs to Gaussian processes (GPs), we prove that, under certain relatively mild assumptions, the expected gradient of the loss with respect to the BNN posterior distribution is vanishing, even when each NN sampled from the BNN posterior does not have vanishing gradients. The experimental results on the MNIST, Fashion MNIST, and a synthetic dataset with BNNs trained with Hamiltonian Monte Carlo and variational inference support this line of arguments, empirically showing that BNNs can display both high accuracy on clean data and robustness to both gradient-based and gradient-free adversarial attacks.
引用
收藏
页码:1 / 14
页数:14
相关论文
共 50 条
  • [31] Exploring misclassifications of robust neural networks to enhance adversarial attacks
    Leo Schwinn
    René Raab
    An Nguyen
    Dario Zanca
    Bjoern Eskofier
    Applied Intelligence, 2023, 53 : 19843 - 19859
  • [32] UnboundAttack: Generating Unbounded Adversarial Attacks to Graph Neural Networks
    Ennadir, Sofiane
    Alkhatib, Amr
    Nikolentzos, Giannis
    Vazirgiannis, Michalis
    Bostrom, Henrik
    COMPLEX NETWORKS & THEIR APPLICATIONS XII, VOL 1, COMPLEX NETWORKS 2023, 2024, 1141 : 100 - 111
  • [33] Reinforced Adversarial Attacks on Deep Neural Networks Using ADMM
    Zhao, Pu
    Xu, Kaidi
    Zhang, Tianyun
    Fardad, Makan
    Wang, Yanzhi
    Lin, Xue
    2018 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2018), 2018, : 1169 - 1173
  • [34] Adversarial Attacks on Deep Neural Networks Based Modulation Recognition
    Liu, Mingqian
    Zhang, Zhenju
    Zhao, Nan
    Chen, Yunfei
    IEEE INFOCOM 2022 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2022,
  • [35] Revisiting Adversarial Attacks on Graph Neural Networks for Graph Classification
    Wang, Xin
    Chang, Heng
    Xie, Beini
    Bian, Tian
    Zhou, Shiji
    Wang, Daixin
    Zhang, Zhiqiang
    Zhu, Wenwu
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (05) : 2166 - 2178
  • [36] Exploring misclassifications of robust neural networks to enhance adversarial attacks
    Schwinn, Leo
    Raab, Rene
    Nguyen, An
    Zanca, Dario
    Eskofier, Bjoern
    APPLIED INTELLIGENCE, 2023, 53 (17) : 19843 - 19859
  • [37] Two-level adversarial attacks for graph neural networks
    Song, Chengxi
    Niu, Lingfeng
    Lei, Minglong
    INFORMATION SCIENCES, 2024, 654
  • [38] Adversarial attacks on neural networks through canonical Riemannian foliations
    Tron, Eliot
    Couellan, Nicolas
    Puechmorel, Stephane
    MACHINE LEARNING, 2024, 113 (11-12) : 8655 - 8686
  • [39] Incrementing Adversarial Robustness with Autoencoding for Machine Learning Model Attacks
    Sivaslioglu, Salved
    Catak, Ferhat Ozgur
    Gul, Ensar
    2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
  • [40] On the robustness of skeleton detection against adversarial attacks
    Bai, Xiuxiu
    Yang, Ming
    Liu, Zhe
    NEURAL NETWORKS, 2020, 132 : 416 - 427