Imperceptible black-box waveform-level adversarial attack towards automatic speaker recognition

被引：11

作者：

Zhang, Xingyu ^{[1
]}

Zhang, Xiongwei ^{[1
]}

Sun, Meng ^{[1
]}

Zou, Xia ^{[1
]}

Chen, Kejiang ^{[2
]}

Yu, Nenghai ^{[2
]}

机构：

[1] Army Engn Univ, Lab Intelligent Informat Proc, Nanjing, Peoples R China

[2] Univ Sci & Technol China, Dept Elect Engn & Informat Sci, Hefei, Peoples R China

来源：

COMPLEX & INTELLIGENT SYSTEMS | 2023年 / 9卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Automatic speaker recognition; Adversarial examples; Imperceptibility; Black-box attack; Differential evolution; Auditory masking; DIFFERENTIAL EVOLUTION;

D O I：

10.1007/s40747-022-00782-x

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Automatic speaker recognition is an important biometric authentication approach with emerging applications. However, recent research has shown its vulnerability on adversarial attacks. In this paper, we propose a new type of adversarial examples by generating imperceptible adversarial samples for targeted attacks on black-box systems of automatic speaker recognition. Waveform samples are created directly by solving an optimization problem with waveform inputs and outputs, which is more realistic in real-life scenario. Inspired by auditory masking, a regularization term adapting to the energy of speech waveform is proposed for generating imperceptible adversarial perturbations. The optimization problems are subsequently solved by differential evolution algorithm in a black-box manner which does not require any knowledge on the inner configuration of the recognition systems. Experiments conducted on commonly used data sets, LibriSpeech and VoxCeleb, show that the proposed methods have successfully performed targeted attacks on state-of-the-art speaker recognition systems while being imperceptible to human listeners. Given the high SNR and PESQ scores of the yielded adversarial samples, the proposed methods deteriorate less on the quality of the original signals than several recently proposed methods, which justifies the imperceptibility of adversarial samples.

引用

页码：65 / 79

页数：15

共 49 条

[1] Practical Hidden Voice Attacks against Speech and Speaker Recognition Systems [J].

Abdullah, Hadi ;

Garcia, Washington ;

Peeters, Christian ;

Traynor, Patrick ;

Butler, Kevin R. B. ;

Wilson, Joseph .

26TH ANNUAL NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM (NDSS 2019), 2019,

[2]

[Anonymous], MICROSOFT AZURE

[3]

[Anonymous], Kaldi

[4]

[Anonymous], 2020, VOICE CONVERSION CHA

[5]

[Anonymous], 2018, P MACHINE LEARNING R

[6] Towards Evaluating the Robustness of Neural Networks [J].

Carlini, Nicholas ;

Wagner, David .

2017 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2017, :39-57

[7]

Carlini N, 2016, PROCEEDINGS OF THE 25TH USENIX SECURITY SYMPOSIUM, P513

[8]

Chen G, 2021, P 2021 IEEE S SEC PR

[9]

Chung JS, 2018, INTERSPEECH, P1086

[10] On the use of i-vector posterior distributions in Probabilistic Linear Discriminant Analysis [J].

Cumani, Sandro ;

Plchot, Oldrich ;

Laface, Pietro .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (04) :846-857

← 1 2 3 4 5 →