SoK: The Faults in our ASRs: An Overview of Attacks against Automatic Speech Recognition and Speaker Identification Systems

被引:68
作者
Abdullah, Hadi [1 ]
Warren, Kevin [1 ]
Bindschaedler, Vincent [1 ]
Papernot, Nicolas [2 ]
Traynor, Patrick [1 ]
机构
[1] Univ Florida, Gainesville, FL 32611 USA
[2] Univ Toronto, Toronto, ON, Canada
来源
2021 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP | 2021年
关键词
ATTENTION;
D O I
10.1109/SP40001.2021.00014
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech and speaker recognition systems are employed in a variety of applications, from personal assistants to telephony surveillance and biometric authentication. The wide deployment of these systems has been made possible by the improved accuracy in neural networks. Like other systems based on neural networks, recent research has demonstrated that speech and speaker recognition systems are vulnerable to attacks using manipulated inputs. However, as we demonstrate in this paper, the end-to-end architecture of speech and speaker systems and the nature of their inputs make attacks and defenses against them substantially different than those in the image space. We demonstrate this first by systematizing existing research in this space and providing a taxonomy through which the community can evaluate future work. We then demonstrate experimentally that attacks against these models almost universally fail to transfer. In so doing, we argue that substantial additional work is required to provide adequate mitigations in this space.
引用
收藏
页码:730 / 747
页数:18
相关论文
共 110 条
[21]  
[Anonymous], Amazon Lex
[22]  
[Anonymous], 2019, Deep Speech 0.4.1
[23]  
[Anonymous], Kaldi ASpIRE Chain Model
[24]  
[Anonymous], DeepSpeech
[25]  
[Anonymous], Mozilla Project DeepSpeech
[26]  
Athalye A, 2018, Arxiv, DOI arXiv:1802.00420
[27]   Towards Open Set Deep Networks [J].
Bendale, Abhijit ;
Boult, Terrance E. .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1563-1572
[28]  
Biggio Battista, 2013, Machine Learning and Knowledge Discovery in Databases. European Conference, ECML PKDD 2013. Proceedings: LNCS 8190, P387, DOI 10.1007/978-3-642-40994-3_25
[29]   One-and-a-Half-Class Multiple Classifier Systems for Secure Learning Against Evasion Attacks at Test Time [J].
Biggio, Battista ;
Corona, Igino ;
He, Zhi-Min ;
Chan, Patrick P. K. ;
Giacinto, Giorgio ;
Yeung, Daniel S. ;
Roli, Fabio .
MULTIPLE CLASSIFIER SYSTEMS (MCS 2015), 2015, 9132 :168-180
[30]   Hello, Is It Me You're Looking For? Differentiating Between Human and Electronic Speakers for Voice Interface Security [J].
Blue, Logan ;
Vargas, Luis ;
Traynor, Patrick .
WISEC'18: PROCEEDINGS OF THE 11TH ACM CONFERENCE ON SECURITY & PRIVACY IN WIRELESS AND MOBILE NETWORKS, 2018, :123-133