Orthogonal Deep Models as Defense Against Black-Box Attacks

被引:5
作者
Jalwana, Mohammad A. A. K. [1 ]
Akhtar, Naveed [1 ]
Bennamoun, Mohammed [1 ]
Mian, Ajmal [1 ]
机构
[1] Univ Western Australia, Dept Comp Sci & Software Engn, Perth, WA 6009, Australia
基金
澳大利亚研究理事会;
关键词
Deep learning; adversarial examples; adversarial perturbations; orthogonal models; robust deep learning;
D O I
10.1109/ACCESS.2020.3005961
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep learning has demonstrated state-of-the-art performance for a variety of challenging computer vision tasks. On one hand, this has enabled deep visual models to pave the way for a plethora of critical applications like disease prognostics and smart surveillance. On the other, deep learning has also been found vulnerable to adversarial attacks, which calls for new techniques to defend deep models against these attacks. Among the attack algorithms, the black-box schemes are of serious practical concern since they only need publicly available knowledge of the targeted model. We carefully analyze the inherent weakness of deep models in black-box settings where the attacker may develop the attack using a model similar to the targeted model. Based on our analysis, we introduce a novel gradient regularization scheme that encourages the internal representation of a deep model to be orthogonal to another, even if the architectures of the two models are similar. Our unique constraint allows a model to concomitantly endeavour for higher accuracy while maintaining near orthogonal alignment of gradients with respect to a reference model. Detailed empirical study verifies that controlled misalignment of gradients under our orthogonality objective significantly boosts a model's robustness against transferable black-box adversarial attacks. In comparison to regular models, the orthogonal models are significantly more robust to a range of l(p) norm bounded perturbations. We verify the effectiveness of our technique on a variety of large-scale models.
引用
收藏
页码:119744 / 119757
页数:14
相关论文
共 76 条
  • [1] Adebayo Julius, 2018, Advances in Neural Information Processing Systems (NeurIPS
  • [2] Akhtar N., 2019, ARXIV190511544
  • [3] Defense against Universal Adversarial Perturbations
    Akhtar, Naveed
    Liu, Jian
    Mian, Ajmal
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 3389 - 3398
  • [4] Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey
    Akhtar, Naveed
    Mian, Ajmal
    [J]. IEEE ACCESS, 2018, 6 : 14410 - 14430
  • [5] Andriushchenko Maksym, 2019, ARXIV191200049
  • [6] [Anonymous], 2016, ARXIV161102770
  • [7] [Anonymous], ARXIV180201421
  • [8] Athalye A., 2018, ARXIV180403286
  • [9] Bai WJ, 2017, 2017 18TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNDP 2017), P53, DOI 10.1109/SNPD.2017.8022700
  • [10] Bethge M, 2017, P REL MACH LEARN WIL, DOI DOI 10.21105/JOSS.02607