Stateful Defenses for Machine Learning Models Are Not Yet Secure Against Black-box Attacks

被引：3

作者：

Feng, Ryan ^{[1
]}

Hooda, Ashish ^{[2
]}

Mangaokar, Neal ^{[1
]}

Fawaz, Kassem ^{[2
]}

Jha, Somesh ^{[2
]}

Prakash, Atul ^{[1
]}

机构：

[1] Univ Michigan, Ann Arbor, MI 48109 USA

[2] Univ Wisconsin Madison, Madison, WI USA

来源：

PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023 | 2023年

基金：

美国国家科学基金会;

关键词：

Machine Learning; Adversarial Examples; Security; Black-box Attacks; Stateful Defenses;

D O I：

10.1145/3576915.3623116

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent work has proposed stateful defense models (SDMs) as a compelling strategy to defend against a black-box attacker who only has query access to the model, as is common for online machine learning platforms. Such stateful defenses aim to defend against black-box attacks by tracking the query history and detecting and rejecting queries that are "similar" and thus preventing black-box attacks from finding useful gradients and making progress towards finding adversarial attacks within a reasonable query budget. Recent SDMs (e.g., Blacklight and PIHA) have shown remarkable success in defending against state-of-the-art black-box attacks. In this paper, we show that SDMs are highly vulnerable to a new class of adaptive black-box attacks. We propose a novel adaptive black-box attack strategy called Oracle-guided Adaptive Rejection Sampling (OARS) that involves two stages: (1) use initial query patterns to infer key properties about an SDM's defense; and, (2) leverage those extracted properties to design subsequent query patterns to evade the SDM's defense while making progress towards finding adversarial inputs. OARS is broadly applicable as an enhancement to existing black-box attacks - we show how to apply the strategy to enhance six common black-box attacks to be more effective against current class of SDMs. For example, OARS-enhanced versions of black-box attacks improved attack success rate against recent stateful defenses from almost 0% to to almost 100% for multiple datasets within reasonable query budgets.

引用

页码：786 / 800

页数：15

共 45 条

[11] PIHA: Detection method using perceptual image hashing against query-based adversarial attacks [J].

Choi, Seok-Hwan ;

Shin, Jinmyeong ;

Choi, Yoon-Ho .

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2023, 145 :563-577

[12]

Clarifai, The world's AI: Clarifai Computer Vision AI and Machine Learning Platform

[13]

Croce F, 2020, PR MACH LEARN RES, V119

[14]

Dalins Janis, 2019, arXiv

[15] The Sybil attack [J].

Douceur, JR .

PEER-TO-PEER SYSTEMS, 2002, 2429 :251-260

[16] Perceptual hashing for image authentication: A survey [J].

Du, Ling ;

Ho, Anthony T. S. ;

Cong, Runmin .

SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 81

[17] IIoT Deep Malware Threat Hunting: From Adversarial Example Detection to Adversarial Scenario Detection [J].

Esmaeili, Bardia ;

Azmoodeh, Amin ;

Dehghantanha, Ali ;

Karimipour, Hadis ;

Zolfaghari, Behrouz ;

Hammoudeh, Mohammad .

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2022, 18 (12) :8477-8486

[18] Robust Physical-World Attacks on Deep Learning Visual Classification [J].

Eykholt, Kevin ;

Evtimov, Ivan ;

Fernandes, Earlence ;

Li, Bo ;

Rahmati, Amir ;

Xiao, Chaowei ;

Prakash, Atul ;

Kohno, Tadayoshi ;

Song, Dawn .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1625-1634

[19] GRAPHITE: Generating Automatic Physical Examples for Machine-Learning Attacks on Computer Vision Systems [J].

Feng, Ryan ;

Mangaokar, Neal ;

Chen, Jiefeng ;

Fernandes, Earlence ;

Jha, Somesh ;

Prakash, Atul .

2022 IEEE 7TH EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY (EUROS&P 2022), 2022, :664-683

[20]

Geiger A, 2012, PROC CVPR IEEE, P3354, DOI 10.1109/CVPR.2012.6248074

← 1 2 3 4 5 →