PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via Split-Second Phoneme Injection

被引:3
作者
Guo, Hanqing [1 ]
Wang, Guangjing [1 ]
Wang, Yuanda [1 ]
Chen, Bocheng [1 ]
Yan, Qiben [1 ]
Xiao, Li [1 ]
机构
[1] Michigan State Univ, E Lansing, MI 48824 USA
来源
PROCEEDINGS OF THE 26TH INTERNATIONAL SYMPOSIUM ON RESEARCH IN ATTACKS, INTRUSIONS AND DEFENSES, RAID 2023 | 2023年
基金
美国国家科学基金会;
关键词
Adversarial attack; voice assistant; black-box attack; query efficiency; SPEECH RECOGNITION;
D O I
10.1145/3607199.3607240
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose PhantomSound, a query-efficient blackbox attack toward voice assistants. Existing black-box adversarial attacks on voice assistants either apply substitution models or leverage the intermediate model output to estimate the gradients for crafting adversarial audio samples. However, these attack approaches require a significant amount of queries with a lengthy training stage. PhantomSound leverages the decision-based attack to produce effective adversarial audios, and reduces the number of queries by optimizing the gradient estimation. In the experiments, we perform our attack against 4 different speech-to-text APIs under 3 real-world scenarios to demonstrate the real-time attack impact. The results show that PhantomSound is practical and robust in attacking 5 popular commercial voice controllable devices over the air, and is able to bypass 3 liveness detection mechanisms with > 95% success rate. The benchmark result shows that PhantomSound can generate adversarial examples and launch the attack in a few minutes. We significantly enhance the query efficiency and reduce the cost of a successful untargeted and targeted adversarial attack by 93.1% and 65.5% compared with the state-of-the-art black-box attacks, using merely similar to 300 queries (similar to 5 minutes) and similar to 1,500 queries (similar to 25 minutes), respectively.
引用
收藏
页码:366 / 380
页数:15
相关论文
共 69 条
[1]  
Abdullah H, 2019, Arxiv, DOI arXiv:1904.05734
[2]   Hear "No Evil", See "Kenansville"*: Efficient and Transferable Black-Box Attacks on Speech Recognition and Voice Identification Systems [J].
Abdullah, Hadi ;
Rahman, Muhammad Sajidur ;
Garcia, Washington ;
Warren, Kevin ;
Yadav, Anurag Swarnim ;
Shrimpton, Tom ;
Traynor, Patrick .
2021 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP, 2021, :712-729
[3]  
Ahmed ME, 2020, PROCEEDINGS OF THE 29TH USENIX SECURITY SYMPOSIUM, P2685
[4]  
Alvarez R, 2019, INT CONF ACOUST SPEE, P6336, DOI 10.1109/ICASSP.2019.8683557
[5]  
Alzantot M, 2018, Arxiv, DOI arXiv:1801.00554
[6]  
Amazon, 2021, Amazon Transcribe
[7]  
Amazon, 2021, Skills
[8]  
Amazon, 2021, Amazon.com
[9]  
Amodei D, 2016, PR MACH LEARN RES, V48
[10]  
[Anonymous], 1993, TIMIT Acoustic-Phonetic Continuous Speech Corpus