Evaluation of Speaker Verification Security and Detection of HMM-Based Synthetic Speech

被引：176

作者：

De Leon, Phillip L. ^{[1
]}

Pucher, Michael ^{[2
]}

Yamagishi, Junichi ^{[3
]}

Hernaez, Inma ^{[4
]}

Saratxaga, Ibon ^{[4
]}

机构：

[1] New Mexico State Univ, Klipsch Sch Elect & Comp Engn, Las Cruces, NM 88003 USA

[2] Telecommun Res Ctr Vienna FTW, A-1220 Vienna, Austria

[3] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9AB, Midlothian, Scotland

[4] Univ Basque Country, Bilbao 48013, Spain

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2012年 / 20卷 / 08期

基金：

奥地利科学基金会; 英国工程与自然科学研究理事会;

关键词：

Security; speaker recognition; speech synthesis; NORMALIZATION; ALGORITHMS; IMPOSTOR; SYSTEM;

D O I：

10.1109/TASL.2012.2201472

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we evaluate the vulnerability of speaker verification (SV) systems to synthetic speech. The SV systems are based on either the Gaussian mixture model-universal background model (GMM-UBM) or support vector machine (SVM) using GMM supervectors. We use a hidden Markov model (HMM)-based text-to-speech (TTS) synthesizer, which can synthesize speech for a target speaker using small amounts of training data through model adaptation of an average voice or background model. Although the SV systems have a very low equal error rate (EER), when tested with synthetic speech generated from speaker models derived from the Wall Street Journal (WSJ) speech corpus, over 81% of the matched claims are accepted. This result suggests vulnerability in SV systems and thus a need to accurately detect synthetic speech. We propose a new feature based on relative phase shift (RPS), demonstrate reliable detection of synthetic speech, and show how this classifier can be used to improve security of SV systems.

引用

页码：2280 / 2290

页数：11

共 50 条

[41] Synthetic Speech Detection Based on the Temporal Consistency of Speaker Features [J].

Zhang, Yuxiang ;

Li, Zhuo ;

Lu, Jingze ;

Wang, Wenchao ;

Zhang, Pengyuan .

IEEE SIGNAL PROCESSING LETTERS, 2024, 31 :944-948

[42] Continuous Control of the Degree of Articulation in HMM-based Speech Synthesis [J].

Picart, Benjamin ;

Drugman, Thomas ;

Dutoit, Thierly .

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, :1808-1811

[43] x Formant-controlled HMM-based Speech Synthesis [J].

Lei, Ming ;

Yamagishi, Junichi ;

Richmond, Korin ;

Ling, Zhen-Hua ;

King, Simon ;

Dai, Li-Rong .

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, :2788-+

[44] A Covariance-Tying Technique for HMM-Based Speech Synthesis [J].

Oura, Keiichiro ;

Zen, Heiga ;

Nankaku, Yoshihiko ;

Lee, Akinobu ;

Tokuda, Keiichi .

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (03) :595-601

[45] Data Selection and Adaptation for Naturalness in HMM-based Speech Synthesis [J].

Cooper, Erica ;

Chang, Alison ;

Levitan, Yocheved ;

Hirschberg, Julia .

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :357-+

[46] Creation of HMM-based Speech Model for Estonian Text-to-Speech Synthesis [J].

Nurk, Tonis .

HUMAN LANGUAGE TECHNOLOGIES: THE BALTIC PERSPECTIVE, 2012, 247 :162-168

[47] Improved Training of Excitation for HMM-based Parametric Speech Synthesis [J].

Shiga, Yoshinori ;

Toda, Tomoki ;

Sakai, Shinsuke ;

Kawai, Hisashi .

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, :809-812

[48] An HMM-based Mandarin Chinese Text-to-Speech system [J].

Qian, Yao ;

Soong, Frank ;

Chen, Yining ;

Chu, Min .

CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 :223-+

[49] HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering [J].

Raitio, Tuomo ;

Suni, Antti ;

Yamagishi, Junichi ;

Pulakka, Hannu ;

Nurminen, Jani ;

Vainio, Martti ;

Alku, Paavo .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (01) :153-165

[50] Voiced/Unvoiced Decision Algorithm for HMM-based Speech Synthesis [J].

Kang, Shiyin ;

Shuang, Zhiwei ;

Duan, Quansheng ;

Qin, Yong ;

Cai, Lianhong .

INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, :420-+

← 1 2 3 4 5 →