Privacy Preserving Defense For Black Box Classifiers Against On-Line Adversarial Attacks

被引:3
|
作者
Theagarajan, Rajkumar [1 ]
Bhanu, Bir [1 ]
机构
[1] Univ Calif Riverside, Ctr Res Intelligent Syst, Riverside, CA 92521 USA
基金
美国国家科学基金会;
关键词
Training; Perturbation methods; Bayes methods; Uncertainty; Deep learning; Privacy; Data models; Adversarial defense; Bayesian uncertainties; black box defense; ensemble of defenses; image purifiers; knowledge distillation; privacy preserving defense;
D O I
10.1109/TPAMI.2021.3125931
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning models have been shown to be vulnerable to adversarial attacks. Adversarial attacks are imperceptible perturbations added to an image such that the deep learning model misclassifies the image with a high confidence. Existing adversarial defenses validate their performance using only the classification accuracy. However, classification accuracy by itself is not a reliable metric to determine if the resulting image is "adversarial-free". This is a foundational problem for online image recognition applications where the ground-truth of the incoming image is not known and hence we cannot compute the accuracy of the classifier or validate if the image is "adversarial-free" or not. This paper proposes a novel privacy preserving framework for defending Black box classifiers from adversarial attacks using an ensemble of iterative adversarial image purifiers whose performance is continuously validated in a loop using Bayesian uncertainties. The proposed approach can convert a single-step black box adversarial defense into an iterative defense and proposes three novel privacy preserving Knowledge Distillation (KD) approaches that use prior meta-information from various datasets to mimic the performance of the Black box classifier. Additionally, this paper proves the existence of an optimal distribution for the purified images that can reach a theoretical lower bound, beyond which the image can no longer be purified. Experimental results on six public benchmark datasets namely: 1) Fashion-MNIST, 2) CIFAR-10, 3) GTSRB, 4) MIO-TCD, 5) Tiny-ImageNet, and 6) MS-Celeb show that the proposed approach can consistently detect adversarial examples and purify or reject them against a variety of adversarial attacks.
引用
收藏
页码:9503 / 9520
页数:18
相关论文
共 50 条
  • [1] Defending Wireless Receivers Against Adversarial Attacks on Modulation Classifiers
    de Araujo-Filho, Paulo Freitas
    Kaddoum, Georges
    Chiheb Ben Nasr, Mohamed
    Arcoverde, Henrique F.
    Campelo, Divanilson R.
    IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (21) : 19153 - 19162
  • [2] An Adaptive Black-Box Defense Against Trojan Attacks (TROJDEF)
    Liu, Guanxiong
    Khreishah, Abdallah
    Sharadgah, Fatima
    Khalil, Issa
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) : 5367 - 5381
  • [3] Cyclic Defense GAN Against Speech Adversarial Attacks
    Esmaeilpour, Mohammad
    Cardinal, Patrick
    Koerich, Alessandro Lameiras
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1769 - 1773
  • [4] Physical Black-Box Adversarial Attacks Through Transformations
    Jiang, Wenbo
    Li, Hongwei
    Xu, Guowen
    Zhang, Tianwei
    Lu, Rongxing
    IEEE TRANSACTIONS ON BIG DATA, 2023, 9 (03) : 964 - 974
  • [5] Use the Spear as a Shield: An Adversarial Example Based Privacy-Preserving Technique Against Membership Inference Attacks
    Xue, Mingfu
    Yuan, Chengxiang
    He, Can
    Wu, Yinghao
    Wu, Zhiyu
    Zhang, Yushu
    Liu, Zhe
    Liu, Weiqiang
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2023, 11 (01) : 153 - 169
  • [6] Defense Against Adversarial Attacks by Reconstructing Images
    Zhang, Shudong
    Gao, Haichang
    Rao, Qingxun
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 6117 - 6129
  • [7] Black-box Evolutionary Search for Adversarial Examples against Deep Image Classifiers in Non-Targeted Attacks
    Prochazka, Stepan
    Neruda, Roman
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [8] A Verifiable Privacy-Preserving Federated Learning Framework Against Collusion Attacks
    Chen, Yange
    He, Suyu
    Wang, Baocang
    Feng, Zhanshen
    Zhu, Guanghui
    Tian, Zhihong
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2025, 24 (05) : 3918 - 3934
  • [9] Adversarial Defense on Harmony: Reverse Attack for Robust AI Models Against Adversarial Attacks
    Kim, Yebon
    Jung, Jinhyo
    Kim, Hyunjun
    So, Hwisoo
    Ko, Yohan
    Shrivastava, Aviral
    Lee, Kyoungwoo
    Hwang, Uiwon
    IEEE ACCESS, 2024, 12 : 176485 - 176497
  • [10] Deblurring as a Defense against Adversarial Attacks
    Duckworth, William, III
    Liao, Weixian
    Yu, Wei
    2023 IEEE 12TH INTERNATIONAL CONFERENCE ON CLOUD NETWORKING, CLOUDNET, 2023, : 61 - 67