PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via Split-Second Phoneme Injection

被引:3
作者
Guo, Hanqing [1 ]
Wang, Guangjing [1 ]
Wang, Yuanda [1 ]
Chen, Bocheng [1 ]
Yan, Qiben [1 ]
Xiao, Li [1 ]
机构
[1] Michigan State Univ, E Lansing, MI 48824 USA
来源
PROCEEDINGS OF THE 26TH INTERNATIONAL SYMPOSIUM ON RESEARCH IN ATTACKS, INTRUSIONS AND DEFENSES, RAID 2023 | 2023年
基金
美国国家科学基金会;
关键词
Adversarial attack; voice assistant; black-box attack; query efficiency; SPEECH RECOGNITION;
D O I
10.1145/3607199.3607240
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose PhantomSound, a query-efficient blackbox attack toward voice assistants. Existing black-box adversarial attacks on voice assistants either apply substitution models or leverage the intermediate model output to estimate the gradients for crafting adversarial audio samples. However, these attack approaches require a significant amount of queries with a lengthy training stage. PhantomSound leverages the decision-based attack to produce effective adversarial audios, and reduces the number of queries by optimizing the gradient estimation. In the experiments, we perform our attack against 4 different speech-to-text APIs under 3 real-world scenarios to demonstrate the real-time attack impact. The results show that PhantomSound is practical and robust in attacking 5 popular commercial voice controllable devices over the air, and is able to bypass 3 liveness detection mechanisms with > 95% success rate. The benchmark result shows that PhantomSound can generate adversarial examples and launch the attack in a few minutes. We significantly enhance the query efficiency and reduce the cost of a successful untargeted and targeted adversarial attack by 93.1% and 65.5% compared with the state-of-the-art black-box attacks, using merely similar to 300 queries (similar to 5 minutes) and similar to 1,500 queries (similar to 25 minutes), respectively.
引用
收藏
页码:366 / 380
页数:15
相关论文
共 69 条
[11]  
[Anonymous], 2017, Apple Machine Learning Journal
[12]  
[Anonymous], 2018, Persian Vowel recognition with MFCC and ANN on PCVC speech dataset
[13]  
Brendel W, 2018, Arxiv, DOI arXiv:1712.04248
[14]   Towards Evaluating the Robustness of Neural Networks [J].
Carlini, Nicholas ;
Wagner, David .
2017 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2017, :39-57
[15]   Audio Adversarial Examples: Targeted Attacks on Speech-to-Text [J].
Carlini, Nicholas ;
Wagner, David .
2018 IEEE SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS (SPW 2018), 2018, :1-7
[16]   Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems [J].
Chen, Guangke ;
Chen, Sen ;
Fan, Lingling ;
Du, Xiaoning ;
Zhao, Zhe ;
Song, Fu ;
Liu, Yang .
2021 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP, 2021, :694-711
[17]   HopSkipJumpAttack: A Query-Efficient Decision-Based Attack [J].
Chen, Jianbo ;
Jordan, Michael, I ;
Wainwright, Martin J. .
2020 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP 2020), 2020, :1277-1294
[18]  
Chen PY, 2017, PROCEEDINGS OF THE 10TH ACM WORKSHOP ON ARTIFICIAL INTELLIGENCE AND SECURITY, AISEC 2017, P15, DOI 10.1145/3128572.3140448
[19]  
Chen YX, 2020, PROCEEDINGS OF THE 29TH USENIX SECURITY SYMPOSIUM, P2667
[20]  
Cheng MH, 2020, Arxiv, DOI arXiv:1909.10773