Backdoor Attacks against Voice Recognition Systems: A Survey

被引：0

作者：

Yan, Baochen ^{[1
]}

Lan, Jiahe ^{[1
]}

Yan, Zheng ^{[1
]}

机构：

[1] Xidian Univ, Sch Cyber Engn, Xian, Peoples R China

来源：

ACM COMPUTING SURVEYS | 2025年 / 57卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Backdoor attacks; voice recognition systems; deep learning; speech recognition; speaker recognition; AUTHENTICATION; TEXTURE; COLOR;

D O I：

10.1145/3701985

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Voice Recognition Systems (VRSs) employ deep learning for speech recognition and speaker recognition. They have been widely deployed in various real-world applications, from intelligent voice assistance to telephony surveillance and biometric authentication. However, prior research has revealed the vulnerability of VRSs to backdoor attacks, which pose a significant threat to the security and privacy of VRSs. Unfortunately, existing literature lacks a thorough review on this topic. This paper fills this research gap by conducting a comprehensive survey on backdoor attacks against VRSs. We first present an overview of VRSs and backdoor attacks, elucidating their basic knowledge. Then we propose a set of evaluation criteria to assess the performance of backdoor attack methods. Next, we present a comprehensive taxonomy of backdoor attacks against VRSs from different perspectives and analyze the characteristic of different categories. After that, we comprehensively review existing attack methods and analyze their pros and cons based on the proposed criteria. Furthermore, we review classic backdoor defense methods and generic audio defense techniques. Then we discuss the feasibility of deploying them on VRSs. Finally, we figure out several open issues and further suggest future research directions to motivate the research of VRSs security.

引用

页数：35

共 117 条

[1] Ahlbom G., 1987, Proceedings: ICASSP 87. 1987 International Conference on Acoustics, Speech, and Signal Processing (Cat. No.87CH2396-0), P13
[2] Ahmed S, 2020, PROCEEDINGS OF THE 29TH USENIX SECURITY SYMPOSIUM, P2703
[3] Music, Search, and IoT: How People (Really) Use Voice Assistants
Ammari, Tawfiq
Kaye, Jofish
Tsai, Janice Y.
Bentley, Frank
[J]. ACM TRANSACTIONS ON COMPUTER-HUMAN INTERACTION, 2019, 26 (03)
[4] Adaptive Time-Frequency Analysis for Noise Reduction in an Audio Filter Bank With Low Delay
Andersen, Kristian Timm
Moonen, Marc
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (04) : 784 - 795
[5] [Anonymous], 2019, BUILDING MACHINE LEA, P59, DOI [DOI 10.1007/978-1-4842-4470-8_7, 10.1007/978-1-4842-4470-8_7]
[6] Emotion Recognition From Expressions in Face, Voice, and Body: The Multimodal Emotion Recognition Test (MERT)
Baenziger, Tanja
Grandjean, Didier
Scherer, Klaus R.
[J]. EMOTION, 2009, 9 (05) : 691 - 704
[7] Bagdasaryan E, 2021, PROCEEDINGS OF THE 30TH USENIX SECURITY SYMPOSIUM, P1505
[8] Boll S. F., 1979, ICASSP 79. 1979 IEEE International Conference on Acoustics, Speech and Signal Processing, P200
[9] STRONG DATA AUGMENTATION SANITIZES POISONING AND BACKDOOR ATTACKS WITHOUT AN ACCURACY TRADEOFF
Borgnia, Eitan
Cherepanova, Valeriia
Fowl, Liam
Ghiasi, Amin
Geiping, Jonas
Goldblum, Micah
Goldstein, Tom
Gupta, Arjun
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3855 - 3859
[10] Brown A, 2022, Arxiv, DOI arXiv:2201.04583

← 1 2 3 4 5 6 7 8 9 10 →