PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via Split-Second Phoneme Injection

被引：3

作者：

Guo, Hanqing ^{[1
]}

Wang, Guangjing ^{[1
]}

Wang, Yuanda ^{[1
]}

Chen, Bocheng ^{[1
]}

Yan, Qiben ^{[1
]}

Xiao, Li ^{[1
]}

机构：

[1] Michigan State Univ, E Lansing, MI 48824 USA

来源：

PROCEEDINGS OF THE 26TH INTERNATIONAL SYMPOSIUM ON RESEARCH IN ATTACKS, INTRUSIONS AND DEFENSES, RAID 2023 | 2023年

基金：

美国国家科学基金会;

关键词：

Adversarial attack; voice assistant; black-box attack; query efficiency; SPEECH RECOGNITION;

D O I：

10.1145/3607199.3607240

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we propose PhantomSound, a query-efficient blackbox attack toward voice assistants. Existing black-box adversarial attacks on voice assistants either apply substitution models or leverage the intermediate model output to estimate the gradients for crafting adversarial audio samples. However, these attack approaches require a significant amount of queries with a lengthy training stage. PhantomSound leverages the decision-based attack to produce effective adversarial audios, and reduces the number of queries by optimizing the gradient estimation. In the experiments, we perform our attack against 4 different speech-to-text APIs under 3 real-world scenarios to demonstrate the real-time attack impact. The results show that PhantomSound is practical and robust in attacking 5 popular commercial voice controllable devices over the air, and is able to bypass 3 liveness detection mechanisms with > 95% success rate. The benchmark result shows that PhantomSound can generate adversarial examples and launch the attack in a few minutes. We significantly enhance the query efficiency and reduce the cost of a successful untargeted and targeted adversarial attack by 93.1% and 65.5% compared with the state-of-the-art black-box attacks, using merely similar to 300 queries (similar to 5 minutes) and similar to 1,500 queries (similar to 25 minutes), respectively.

引用

页码：366 / 380

页数：15

共 69 条

[1]

Abdullah H, 2019, Arxiv, DOI arXiv:1904.05734

[2] Hear "No Evil", See "Kenansville"*: Efficient and Transferable Black-Box Attacks on Speech Recognition and Voice Identification Systems [J].

Abdullah, Hadi ;

Rahman, Muhammad Sajidur ;

Garcia, Washington ;

Warren, Kevin ;

Yadav, Anurag Swarnim ;

Shrimpton, Tom ;

Traynor, Patrick .

2021 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP, 2021, :712-729

[3]

Ahmed ME, 2020, PROCEEDINGS OF THE 29TH USENIX SECURITY SYMPOSIUM, P2685

[4]

Alvarez R, 2019, INT CONF ACOUST SPEE, P6336, DOI 10.1109/ICASSP.2019.8683557

[5]

Alzantot M, 2018, Arxiv, DOI arXiv:1801.00554

[6]

Amazon, 2021, Amazon Transcribe

[7]

Amazon, 2021, Skills

[8]

Amazon, 2021, Amazon.com

[9]

Amodei D, 2016, PR MACH LEARN RES, V48

[10]

[Anonymous], 1993, TIMIT Acoustic-Phonetic Continuous Speech Corpus

← 1 2 3 4 5 6 7 →