Practical Adversarial Attacks Against Speaker Recognition Systems

被引：69

作者：

Li, Zhuohang ^{[1
]}

Shi, Cong ^{[2
]}

Xie, Yi ^{[2
]}

Liu, Jian ^{[1
]}

Yuan, Bo ^{[2
]}

Chen, Yingying ^{[2
]}

机构：

[1] Univ Tennessee, Knoxville, TN 37996 USA

[2] Rutgers State Univ, New Brunswick, NJ 08901 USA

来源：

PROCEEDINGS OF THE 21ST INTERNATIONAL WORKSHOP ON MOBILE COMPUTING SYSTEMS AND APPLICATIONS (HOTMOBILE'20) | 2020年

基金：

美国国家科学基金会;

关键词：

Speaker Recognition; Deep Learning; Adversarial Example; Room Impulse Response;

D O I：

10.1145/3376897.3377856

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Unlike other biometric-based user identification methods (e.g., fingerprint and iris), speaker recognition systems can identify individuals relying on their unique voice biometrics without requiring users to be physically present. Therefore, speaker recognition systems have been becoming increasingly popular recently in various domains, such as remote access control, banking services and criminal investigation. In this paper, we study the vulnerability of this kind of systems by launching a practical and systematic adversarial attack against X-vector, the state-of-the-art deep neural network (DNN) based speaker recognition system. In particular, by adding a well-crafted inconspicuous noise to the original audio, our attack can fool the speaker recognition system to make false predictions and even force the audio to be recognized as any adversary-desired speaker. Moreover, our attack integrates the estimated room impulse response (RIR) into the adversarial example training process toward practical audio adversarial examples which could remain effective while being played over the air in the physical world. Extensive experiment using a public dataset of 109 speakers shows the effectiveness of our attack with a high attack success rate for both digital attack (98%) and practical over-the-air attack (50%).

引用

页码：9 / 14

页数：6

共 23 条

[1]

Amin Talal B., 2013, Journal of the Acoustical Society of America, V134, DOI 10.1121/1.4830853

[2]

Blomberg M, 1999, 6 EUR C SPEECH COMM

[3]

Boulianne D., 2011, IEEE 2011 WORKSH AUT, P1, DOI DOI 10.1017/CBO9781107415324.004

[4]

Carlini M, 2018, UPDATES SURG SER, P1, DOI 10.1007/978-88-470-3955-1

[5]

Chase Bank, 2019, SEC UN YOUR VOIC

[6]

FARINA A., 2000, AUDIO ENG SOC CONVEN, V108

[7]

Goodfellow I.J., 2014, 3 INT C LEARNING REP

[8] Estimating Bus Loads and OD Flows Using Location-Stamped Farebox and Wi-Fi Signal Data [J].

Ji, Yuxiong ;

Zhao, Jizhou ;

Zhang, Zhiming ;

Du, Yuchuan .

JOURNAL OF ADVANCED TRANSPORTATION, 2017,

[9]

Kinnunen T, 2012, INT CONF ACOUST SPEE, P4401, DOI 10.1109/ICASSP.2012.6288895

[10]

Kreuk F, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P1962, DOI 10.1109/ICASSP.2018.8462693

← 1 2 3 →