Backdoor Attacks against Voice Recognition Systems: A Survey

被引:0
作者
Yan, Baochen [1 ]
Lan, Jiahe [1 ]
Yan, Zheng [1 ]
机构
[1] Xidian Univ, Sch Cyber Engn, Xian, Peoples R China
基金
中国国家自然科学基金;
关键词
Backdoor attacks; voice recognition systems; deep learning; speech recognition; speaker recognition; AUTHENTICATION; TEXTURE; COLOR;
D O I
10.1145/3701985
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Voice Recognition Systems (VRSs) employ deep learning for speech recognition and speaker recognition. They have been widely deployed in various real-world applications, from intelligent voice assistance to telephony surveillance and biometric authentication. However, prior research has revealed the vulnerability of VRSs to backdoor attacks, which pose a significant threat to the security and privacy of VRSs. Unfortunately, existing literature lacks a thorough review on this topic. This paper fills this research gap by conducting a comprehensive survey on backdoor attacks against VRSs. We first present an overview of VRSs and backdoor attacks, elucidating their basic knowledge. Then we propose a set of evaluation criteria to assess the performance of backdoor attack methods. Next, we present a comprehensive taxonomy of backdoor attacks against VRSs from different perspectives and analyze the characteristic of different categories. After that, we comprehensively review existing attack methods and analyze their pros and cons based on the proposed criteria. Furthermore, we review classic backdoor defense methods and generic audio defense techniques. Then we discuss the feasibility of deploying them on VRSs. Finally, we figure out several open issues and further suggest future research directions to motivate the research of VRSs security.
引用
收藏
页数:35
相关论文
共 117 条
  • [1] Ahlbom G., 1987, Proceedings: ICASSP 87. 1987 International Conference on Acoustics, Speech, and Signal Processing (Cat. No.87CH2396-0), P13
  • [2] Ahmed S, 2020, PROCEEDINGS OF THE 29TH USENIX SECURITY SYMPOSIUM, P2703
  • [3] Music, Search, and IoT: How People (Really) Use Voice Assistants
    Ammari, Tawfiq
    Kaye, Jofish
    Tsai, Janice Y.
    Bentley, Frank
    [J]. ACM TRANSACTIONS ON COMPUTER-HUMAN INTERACTION, 2019, 26 (03)
  • [4] Adaptive Time-Frequency Analysis for Noise Reduction in an Audio Filter Bank With Low Delay
    Andersen, Kristian Timm
    Moonen, Marc
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (04) : 784 - 795
  • [5] [Anonymous], 2019, BUILDING MACHINE LEA, P59, DOI [DOI 10.1007/978-1-4842-4470-8_7, 10.1007/978-1-4842-4470-8_7]
  • [6] Emotion Recognition From Expressions in Face, Voice, and Body: The Multimodal Emotion Recognition Test (MERT)
    Baenziger, Tanja
    Grandjean, Didier
    Scherer, Klaus R.
    [J]. EMOTION, 2009, 9 (05) : 691 - 704
  • [7] Bagdasaryan E, 2021, PROCEEDINGS OF THE 30TH USENIX SECURITY SYMPOSIUM, P1505
  • [8] Boll S. F., 1979, ICASSP 79. 1979 IEEE International Conference on Acoustics, Speech and Signal Processing, P200
  • [9] STRONG DATA AUGMENTATION SANITIZES POISONING AND BACKDOOR ATTACKS WITHOUT AN ACCURACY TRADEOFF
    Borgnia, Eitan
    Cherepanova, Valeriia
    Fowl, Liam
    Ghiasi, Amin
    Geiping, Jonas
    Goldblum, Micah
    Goldstein, Tom
    Gupta, Arjun
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3855 - 3859
  • [10] Brown A, 2022, Arxiv, DOI arXiv:2201.04583