Imperceptible black-box waveform-level adversarial attack towards automatic speaker recognition

被引:0
作者
Xingyu Zhang
Xiongwei Zhang
Meng Sun
Xia Zou
Kejiang Chen
Nenghai Yu
机构
[1] Army Engineering University,Laboratory of Intelligent Information Processing
[2] University of Science and Technology of China,Department of Electronic Engineering and Information Science
来源
Complex & Intelligent Systems | 2023年 / 9卷
关键词
Automatic speaker recognition; Adversarial examples; Imperceptibility; Black-box attack; Differential evolution; Auditory masking;
D O I
暂无
中图分类号
学科分类号
摘要
Automatic speaker recognition is an important biometric authentication approach with emerging applications. However, recent research has shown its vulnerability on adversarial attacks. In this paper, we propose a new type of adversarial examples by generating imperceptible adversarial samples for targeted attacks on black-box systems of automatic speaker recognition. Waveform samples are created directly by solving an optimization problem with waveform inputs and outputs, which is more realistic in real-life scenario. Inspired by auditory masking, a regularization term adapting to the energy of speech waveform is proposed for generating imperceptible adversarial perturbations. The optimization problems are subsequently solved by differential evolution algorithm in a black-box manner which does not require any knowledge on the inner configuration of the recognition systems. Experiments conducted on commonly used data sets, LibriSpeech and VoxCeleb, show that the proposed methods have successfully performed targeted attacks on state-of-the-art speaker recognition systems while being imperceptible to human listeners. Given the high SNR and PESQ scores of the yielded adversarial samples, the proposed methods deteriorate less on the quality of the original signals than several recently proposed methods, which justifies the imperceptibility of adversarial samples.
引用
收藏
页码:65 / 79
页数:14
相关论文
共 33 条
[1]  
Hansen JH(2015)Speaker recognition by machines and humans: a tutorial review IEEE Signal Process Mag 32 74-99
[2]  
Hasan T(2019)One pixel attack for fooling deep neural networks IEEE Trans Evol Comput 23 828-841
[3]  
Su J(2018)Deep learning for environmentally robust speech recognition: an overview of recent developments ACM Trans Intell Syst Technol (TIST) 9 1-28
[4]  
Vargas DV(2019)Adversarial examples: attacks and defenses for deep learning IEEE Trans Neural Netw Learn Syst 30 2805-2824
[5]  
Sakurai K(2014)Factor analysis method for text-independent speaker identification J Softw (JSW) 9 2851-2860
[6]  
Zhang Z(2014)On the use of I-vector posterior distributions in probabilistic linear discriminant analysis IEEE/ACM Trans Audio Speech Lang Process 22 846-857
[7]  
Geiger J(2019)Differential evolution: a survey of theoretical analyses Swarm Evol Comput 44 546-558
[8]  
Pohjalainen J(2016)Recent advances in differential evolution—an updated survey Swarm Evol Comput 27 1-30
[9]  
Mousa AED(2014)Enhanced versions of differential evolution: state of the art survey Int J Comput Sci Math 5 107-126
[10]  
Jin W(2015)Differential evolution with an individual-dependent mechanism IEEE Trans Evol Comput 19 560-574