SoK: Pitfalls in Evaluating Black-Box Attacks

被引：1

作者：

Suya, Fnu ^{[1
]}

Suri, Anshuman ^{[2
]}

Zhang, Tingwei ^{[3
]}

Hong, Jingtao ^{[4
]}

Tian, Yuan ^{[5
]}

Evans, David ^{[2
]}

机构：

[1] Univ Maryland Coll Pk, College Pk, MD 20742 USA

[2] Univ Virginia, Charlottesville, VA USA

[3] Cornell Univ, Ithaca, NY USA

[4] Columbia Univ, New York, NY USA

[5] Univ Calif Los Angeles, Los Angeles, CA USA

来源：

IEEE CONFERENCE ON SAFE AND TRUSTWORTHY MACHINE LEARNING, SATML 2024 | 2024年

基金：

美国国家科学基金会;

关键词：

ADVERSARIAL EXAMPLES; ROBUSTNESS;

D O I：

10.1109/SaTML59370.2024.00026

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Numerous works study black-box attacks on image classifiers, where adversaries generate adversarial examples against unknown target models without having access to their internal information. However, these works make different assumptions about the adversary's knowledge, and current literature lacks cohesive organization centered around the threat model. To systematize knowledge in this area, we propose a taxonomy over the threat space spanning the axes of feedback granularity, the access of interactive queries, and the quality and quantity of the auxiliary data available to the attacker. Our new taxonomy provides three key insights. 1) Despite extensive literature, numerous under-explored threat spaces exist, which cannot be trivially solved by adapting techniques from well-explored settings. We demonstrate this by establishing a new state-of-the-art in the less-studied setting of access to top-k confidence scores by adapting techniques from well-explored settings of accessing the complete confidence vector but show how it still falls short of the more restrictive setting that only obtains the prediction label, highlighting the need for more research. 2) Identifying the threat models for different attacks uncovers stronger baselines that challenge prior state-of-the-art claims. We demonstrate this by enhancing an initially weaker baseline (under interactive query access) via surrogate models, effectively overturning claims in the respective paper. 3) Our taxonomy reveals interactions between attacker knowledge that connect well to related areas, such as model inversion and extraction attacks. We discuss how advances in other areas can enable stronger black-box attacks. Finally, we emphasize the need for a more realistic assessment of attack success by factoring in local attack runtime. This approach reveals the potential for certain attacks to achieve notably higher success rates. We also highlight the need to evaluate attacks in diverse and harder settings and underscore the need for better selection criteria when picking the best candidate adversarial examples.

引用

页码：387 / 407

页数：21

共 205 条

[1]

Abdullah H, 2021, P IEEE S SECUR PRIV, P730, DOI 10.1109/SP40001.2021.00014

[2]

Al-Dujaili A., 2019, International Conference on Machine Learning

[3] GenAttack: Practical Black-box Attacks with Gradient-Free Optimization [J].

Alzantot, Moustafa ;

Sharma, Yash ;

Chakraborty, Supriyo ;

Zhang, Huan ;

Hsieh, Cho-Jui ;

Srivastava, Mani B. .

PROCEEDINGS OF THE 2019 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'19), 2019, :1111-1119

[4]

Andriushchenko Maksym, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12368), P484, DOI 10.1007/978-3-030-58592-1_29

[5]

[Anonymous], 2009, CIFAR-100 Dataset

[6] "Real Attackers Don't Compute Gradients": Bridging the Gap Between Adversarial ML Research and Practice [J].

Apruzzese, Giovanni ;

Anderson, Hyrum S. ;

Dambra, Savino ;

Freeman, David ;

Pierazzi, Fabio ;

Roundy, Kevin .

2023 IEEE CONFERENCE ON SECURE AND TRUSTWORTHY MACHINE LEARNING, SATML, 2023, :339-364

[7]

Bai Y., 2020, EUROPEAN C COMPUTER

[8]

Baluja S, 2017, Arxiv, DOI arXiv:1703.09387

[9]

Bhagoji A. N., 2018, European Conference on Computer Vision

[10]

Bhambri S, 2020, Arxiv, DOI [arXiv:1912.01667, DOI 10.48550/ARXIV.1912.01667]

← 1 2 3 4 5 6 7 8 9 10 →