Speaker-Specific Utterance Ensemble based Transfer Attack on Speaker Identification

被引:4
作者
Zuo, Chu-Xiao [1 ]
Leng, Jia-Yi [1 ]
Li, Wu-Jun [1 ]
机构
[1] Nanjing Univ, Dept Comp Sci & Technol, Natl Key Lab Novel Software Technol, Nanjing, Peoples R China
来源
INTERSPEECH 2022 | 2022年
基金
国家重点研发计划;
关键词
speaker identification; adversarial attack; black-box attack; transfer attack;
D O I
10.21437/Interspeech.2022-10139
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
While speaker identification (SI) systems based on deep neural network (DNN) have been widely applied in security-related practical tasks, more and more attention has been attracted to the robustness of SI systems against potential malicious threats. Existing works have shown that white-box attacks can greatly threaten the current SI systems, but white-box attacks require complete knowledge of the target model, which is almost impractical in many applications. As far as we know, only a few works have studied the more practical black-box attacks, while these attacks are mostly ported from computer vision task and lack the adaptability to speech data. In this work, we propose a novel black-box attack, called speaker-specific utterance ensemble based transfer attack (SUETA). SUETA utilizes the unique characteristic of speech data that different utterances of one specific speaker share the same voiceprint to attack on SI systems. To the best of our knowledge, SUETA is the first black-box attack on SI systems that utilizes the unique characteristic of speech data. Experimental results on three representative SI models show that SUETA can achieve better transfer success rate (TSR) than speaker-unrelated baselines. Furthermore, SUETA can even improve the attack success rate (ASR) of white-box attacks on local substitute model, which is the first step to perform the transfer based black-box attack.
引用
收藏
页码:3203 / 3207
页数:5
相关论文
共 33 条
[1]  
Carlini N, 2019, On evaluating adversarial robustness
[2]   Towards Evaluating the Robustness of Neural Networks [J].
Carlini, Nicholas ;
Wagner, David .
2017 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2017, :39-57
[3]  
Caruana R., 2004, INT C MACH LEARN ICM, V69
[4]   Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems [J].
Chen, Guangke ;
Chen, Sen ;
Fan, Lingling ;
Du, Xiaoning ;
Zhao, Zhe ;
Song, Fu ;
Liu, Yang .
2021 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP, 2021, :694-711
[5]   ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification [J].
Desplanques, Brecht ;
Thienpondt, Jenthe ;
Demuynck, Kris .
INTERSPEECH 2020, 2020, :3830-3834
[6]   Evading Defenses to Transferable Adversarial Examples by Translation-Invariant Attacks [J].
Dong, Yinpeng ;
Pang, Tianyu ;
Su, Hang ;
Zhu, Jun .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4307-4316
[7]   Boosting Adversarial Attacks with Momentum [J].
Dong, Yinpeng ;
Liao, Fangzhou ;
Pang, Tianyu ;
Su, Hang ;
Zhu, Jun ;
Hu, Xiaolin ;
Li, Jianguo .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :9185-9193
[8]   Deep Hashing for Speaker Identification and Retrieval [J].
Fan, Lei ;
Jiang, Qing-Yuan ;
Yu, Ya-Qi ;
Li, Wu-Jun .
INTERSPEECH 2019, 2019, :2908-2912
[9]  
Garofolo J.S., 1993, TIMIT acoustic phonetic continuous speech corpus
[10]  
Gong Y., 2017, CRAFTING ADVERSARIAL