SoK: Pitfalls in Evaluating Black-Box Attacks

被引:1
作者
Suya, Fnu [1 ]
Suri, Anshuman [2 ]
Zhang, Tingwei [3 ]
Hong, Jingtao [4 ]
Tian, Yuan [5 ]
Evans, David [2 ]
机构
[1] Univ Maryland Coll Pk, College Pk, MD 20742 USA
[2] Univ Virginia, Charlottesville, VA USA
[3] Cornell Univ, Ithaca, NY USA
[4] Columbia Univ, New York, NY USA
[5] Univ Calif Los Angeles, Los Angeles, CA USA
来源
IEEE CONFERENCE ON SAFE AND TRUSTWORTHY MACHINE LEARNING, SATML 2024 | 2024年
基金
美国国家科学基金会;
关键词
ADVERSARIAL EXAMPLES; ROBUSTNESS;
D O I
10.1109/SaTML59370.2024.00026
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Numerous works study black-box attacks on image classifiers, where adversaries generate adversarial examples against unknown target models without having access to their internal information. However, these works make different assumptions about the adversary's knowledge, and current literature lacks cohesive organization centered around the threat model. To systematize knowledge in this area, we propose a taxonomy over the threat space spanning the axes of feedback granularity, the access of interactive queries, and the quality and quantity of the auxiliary data available to the attacker. Our new taxonomy provides three key insights. 1) Despite extensive literature, numerous under-explored threat spaces exist, which cannot be trivially solved by adapting techniques from well-explored settings. We demonstrate this by establishing a new state-of-the-art in the less-studied setting of access to top-k confidence scores by adapting techniques from well-explored settings of accessing the complete confidence vector but show how it still falls short of the more restrictive setting that only obtains the prediction label, highlighting the need for more research. 2) Identifying the threat models for different attacks uncovers stronger baselines that challenge prior state-of-the-art claims. We demonstrate this by enhancing an initially weaker baseline (under interactive query access) via surrogate models, effectively overturning claims in the respective paper. 3) Our taxonomy reveals interactions between attacker knowledge that connect well to related areas, such as model inversion and extraction attacks. We discuss how advances in other areas can enable stronger black-box attacks. Finally, we emphasize the need for a more realistic assessment of attack success by factoring in local attack runtime. This approach reveals the potential for certain attacks to achieve notably higher success rates. We also highlight the need to evaluate attacks in diverse and harder settings and underscore the need for better selection criteria when picking the best candidate adversarial examples.
引用
收藏
页码:387 / 407
页数:21
相关论文
共 205 条
[31]  
Croce F, 2022, AAAI CONF ARTIF INTE, P6437
[32]   Sparse and Imperceivable Adversarial Attacks [J].
Croce, Francesco ;
Hein, Matthias .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4723-4731
[33]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[34]  
Deng Z, 2021, Advances in Neural Information Processing Systems
[35]  
Diochnos D., 2018, Advances in Neural Information Processing Systems
[36]  
Diochnos D. I., 2020, IEEE INT C MACH LEAR
[37]  
Dong JH, 2024, Arxiv, DOI [arXiv:2303.14133, DOI 10.48550/ARXIV.2303.14133]
[38]   Efficient Decision-based Black-box Adversarial Attacks on Face Recognition [J].
Dong, Yinpeng ;
Su, Hang ;
Wu, Baoyuan ;
Li, Zhifeng ;
Liu, Wei ;
Zhang, Tong ;
Zhu, Jun .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :7706-7714
[39]   Evading Defenses to Transferable Adversarial Examples by Translation-Invariant Attacks [J].
Dong, Yinpeng ;
Pang, Tianyu ;
Su, Hang ;
Zhu, Jun .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4307-4316
[40]   Boosting Adversarial Attacks with Momentum [J].
Dong, Yinpeng ;
Liao, Fangzhou ;
Pang, Tianyu ;
Su, Hang ;
Zhu, Jun ;
Hu, Xiaolin ;
Li, Jianguo .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :9185-9193