Toward Stealthy Backdoor Attacks Against Speech Recognition via Elements of Sound

被引:7
作者
Cai, Hanbo [1 ]
Zhang, Pengcheng [1 ]
Dong, Hai [2 ]
Xiao, Yan [3 ]
Koffas, Stefanos [4 ]
Li, Yiming [5 ,6 ]
机构
[1] Hohai Univ, Coll Comp Sci & Software Engn, Nanjing 211100, Peoples R China
[2] RMIT Univ, Sch Comp Technol, Melbourne, Vic 3000, Australia
[3] Sun Yat Sen Univ, Sch Cyber Sci & Technol, Shenzhen Campus, Shenzhen 518107, Peoples R China
[4] Delft Univ Technol, Cybersecur Grp, NL-2628 CD Delft, Netherlands
[5] Nanyang Technol Univ, Coll Comp & Data Sci, Singapore 639798, Singapore
[6] Zhejiang Univ, State Key Lab Blockchain & Data Secur, Hangzhou 311200, Peoples R China
基金
中国国家自然科学基金;
关键词
Speech recognition; Training; Timbre; Testing; Implants; Hidden Markov models; Spectrogram; Backdoor attack; backdoor learning; speech recognition; AI security; trustworthy ML; MACHINE;
D O I
10.1109/TIFS.2024.3404885
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Deep neural networks (DNNs) have been widely and successfully adopted and deployed in various applications of speech recognition. Recently, a few works revealed that these models are vulnerable to backdoor attacks, where the adversaries can implant malicious prediction behaviors into victim models by poisoning their training process. In this paper, we revisit poison-only backdoor attacks against speech recognition. We reveal that existing methods are not stealthy since their trigger patterns are perceptible to humans or machine detection. This limitation is mostly because their trigger patterns are simple noises or separable and distinctive clips. Motivated by these findings, we propose to exploit elements of sound ( e.g ., pitch and timbre) to design more stealthy yet effective poison-only backdoor attacks. Specifically, we insert a short-duration high-pitched signal as the trigger and increase the pitch of remaining audio clips to 'mask' it for designing stealthy pitch-based triggers. We manipulate timbre features of victim audio to design the stealthy timbre-based attack and design a voiceprint selection module to facilitate the multi-backdoor attack. Our attacks can generate more 'natural' poisoned samples and therefore are more stealthy. Extensive experiments are conducted on benchmark datasets, which verify the effectiveness of our attacks under different settings ( e.g ., all-to-one, all-to-all, clean-label, physical, and multi-backdoor settings) and their stealthiness. Our methods achieve attack success rates of over 95% in most cases and are nearly undetectable. The code for reproducing main experiments are available at https://github.com/HanboCai/BadSpeech_SoE.
引用
收藏
页码:5852 / 5866
页数:15
相关论文
共 69 条
[1]  
[Anonymous], 1995, Introduction to signal processing
[2]  
Athalye A, 2018, PR MACH LEARN RES, V80
[3]  
Berg M. OConnor, 2021, INTERSPEECH, P1
[4]  
Chen Kangjie., 2022, P ICLR
[5]  
Chen XY, 2017, Arxiv, DOI arXiv:1712.05526
[6]  
de Andrade DC, 2018, Arxiv, DOI [arXiv:1808.08929, DOI 10.48550/ARXIV.1808.08929]
[7]  
Cui G., 2022, P NIPS, P1
[8]   The Application of Hidden Markov Models in Speech Recognition [J].
Gales, Mark ;
Young, Steve .
FOUNDATIONS AND TRENDS IN SIGNAL PROCESSING, 2007, 1 (03) :195-304
[9]   Not All Samples Are Born Equal: Towards Effective Clean-Label Backdoor Attacks [J].
Gao, Yinghua ;
Li, Yiming ;
Zhu, Linghui ;
Wu, Dongxian ;
Jiang, Yong ;
Xia, Shu-Tao .
PATTERN RECOGNITION, 2023, 139
[10]  
Garofolo J. S, 1993, Tech.Rep. NISTIR 4930