PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via Split-Second Phoneme Injection

被引：3

作者：

Guo, Hanqing ^{[1
]}

Wang, Guangjing ^{[1
]}

Wang, Yuanda ^{[1
]}

Chen, Bocheng ^{[1
]}

Yan, Qiben ^{[1
]}

Xiao, Li ^{[1
]}

机构：

[1] Michigan State Univ, E Lansing, MI 48824 USA

来源：

PROCEEDINGS OF THE 26TH INTERNATIONAL SYMPOSIUM ON RESEARCH IN ATTACKS, INTRUSIONS AND DEFENSES, RAID 2023 | 2023年

基金：

美国国家科学基金会;

关键词：

Adversarial attack; voice assistant; black-box attack; query efficiency; SPEECH RECOGNITION;

D O I：

10.1145/3607199.3607240

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we propose PhantomSound, a query-efficient blackbox attack toward voice assistants. Existing black-box adversarial attacks on voice assistants either apply substitution models or leverage the intermediate model output to estimate the gradients for crafting adversarial audio samples. However, these attack approaches require a significant amount of queries with a lengthy training stage. PhantomSound leverages the decision-based attack to produce effective adversarial audios, and reduces the number of queries by optimizing the gradient estimation. In the experiments, we perform our attack against 4 different speech-to-text APIs under 3 real-world scenarios to demonstrate the real-time attack impact. The results show that PhantomSound is practical and robust in attacking 5 popular commercial voice controllable devices over the air, and is able to bypass 3 liveness detection mechanisms with > 95% success rate. The benchmark result shows that PhantomSound can generate adversarial examples and launch the attack in a few minutes. We significantly enhance the query efficiency and reduce the cost of a successful untargeted and targeted adversarial attack by 93.1% and 65.5% compared with the state-of-the-art black-box attacks, using merely similar to 300 queries (similar to 5 minutes) and similar to 1,500 queries (similar to 25 minutes), respectively.

引用

页码：366 / 380

页数：15

共 69 条

[11]

[Anonymous], 2017, Apple Machine Learning Journal

[12]

[Anonymous], 2018, Persian Vowel recognition with MFCC and ANN on PCVC speech dataset

[13]

Brendel W, 2018, Arxiv, DOI arXiv:1712.04248

[14] Towards Evaluating the Robustness of Neural Networks [J].

Carlini, Nicholas ;

Wagner, David .

2017 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2017, :39-57

[15] Audio Adversarial Examples: Targeted Attacks on Speech-to-Text [J].

Carlini, Nicholas ;

Wagner, David .

2018 IEEE SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (SPW 2018), 2018, :1-7

[16] Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems [J].

Chen, Guangke ;

Chen, Sen ;

Fan, Lingling ;

Du, Xiaoning ;

Zhao, Zhe ;

Song, Fu ;

Liu, Yang .

2021 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP, 2021, :694-711

[17] HopSkipJumpAttack: A Query-Efficient Decision-Based Attack [J].

Chen, Jianbo ;

Jordan, Michael, I ;

Wainwright, Martin J. .

2020 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP 2020), 2020, :1277-1294

[18]

Chen PY, 2017, PROCEEDINGS OF THE 10TH ACM WORKSHOP ON ARTIFICIAL INTELLIGENCE AND SECURITY, AISEC 2017, P15, DOI 10.1145/3128572.3140448

[19]

Chen YX, 2020, PROCEEDINGS OF THE 29TH USENIX SECURITY SYMPOSIUM, P2667

[20]

Cheng MH, 2020, Arxiv, DOI arXiv:1909.10773

← 1 2 3 4 5 6 7 →