Evaluation of Speaker Verification Security and Detection of HMM-Based Synthetic Speech

被引：173

作者：

De Leon, Phillip L. ^{[1
]}

Pucher, Michael ^{[2
]}

Yamagishi, Junichi ^{[3
]}

Hernaez, Inma ^{[4
]}

Saratxaga, Ibon ^{[4
]}

机构：

[1] New Mexico State Univ, Klipsch Sch Elect & Comp Engn, Las Cruces, NM 88003 USA

[2] Telecommun Res Ctr Vienna FTW, A-1220 Vienna, Austria

[3] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9AB, Midlothian, Scotland

[4] Univ Basque Country, Bilbao 48013, Spain

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2012年 / 20卷 / 08期

基金：

奥地利科学基金会; 英国工程与自然科学研究理事会;

关键词：

Security; speaker recognition; speech synthesis; NORMALIZATION; ALGORITHMS; IMPOSTOR; SYSTEM;

D O I：

10.1109/TASL.2012.2201472

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we evaluate the vulnerability of speaker verification (SV) systems to synthetic speech. The SV systems are based on either the Gaussian mixture model-universal background model (GMM-UBM) or support vector machine (SVM) using GMM supervectors. We use a hidden Markov model (HMM)-based text-to-speech (TTS) synthesizer, which can synthesize speech for a target speaker using small amounts of training data through model adaptation of an average voice or background model. Although the SV systems have a very low equal error rate (EER), when tested with synthetic speech generated from speaker models derived from the Wall Street Journal (WSJ) speech corpus, over 81% of the matched claims are accepted. This result suggests vulnerability in SV systems and thus a need to accurately detect synthetic speech. We propose a new feature based on relative phase shift (RPS), demonstrate reliable detection of synthetic speech, and show how this classifier can be used to improve security of SV systems.

引用

页码：2280 / 2290

页数：11

共 50 条

[21] HMM-Based Speech Synthesis for the Greek Language [J].

Karabetsos, Sotiris ;

Tsiakoulis, Pirros ;

Chalamandaris, Aimilios ;

Raptis, Spyros .

TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2008, 5246 :349-356

[22] An HMM-based Cantonese Speech Synthesis System [J].

Wang, Xin ;

Wu, Zhiyong .

2012 IEEE GLOBAL HIGH TECH CONGRESS ON ELECTRONICS (GHTCE), 2012,

[23] Unsupervised adaptation for HMM-based speech synthesis [J].

King, Simon ;

Tokuda, Keiichi ;

Zen, Heiga ;

Yamagishi, Junichi .

INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, :1869-+

[24] Thousands of Voices for HMM-based Speech Synthesis [J].

Yamagishi, Junichi ;

Usabaev, Bela ;

King, Simon ;

Watts, Oliver ;

Dines, John ;

Tian, Jilei ;

Hu, Rile ;

Guan, Yong ;

Oura, Keiichiro ;

Tokuda, Keiichi ;

Karhila, Reima ;

Kurimo, Mikko .

INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, :416-+

[25] Analysis of HMM-Based Lombard Speech Synthesis [J].

Raitio, Tuomo ;

Suni, Antti ;

Vainio, Martti ;

Alku, Paavo .

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, :2792-+

[26] Performance evaluation of the speaker-independent HMM-based speech synthesis system "HTS-2007" for the Blizzard Challenge 2007 [J].

Yamagishi, Junichi ;

Nose, Takashi ;

Zen, Heiga ;

Toda, Tomoki ;

Tokuda, Keiichi .

2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, :3957-+

[27] A COMPARISON OF SUPERVISED AND UNSUPERVISED CROSS-LINGUAL SPEAKER ADAPTATION APPROACHES FOR HMM-BASED SPEECH SYNTHESIS [J].

Liang, Hui ;

Dines, John ;

Saheer, Lakshmi .

2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, :4598-4601

[28] Noise in HMM-Based Speech Synthesis Adaptation: Analysis, Evaluation Methods and Experiments [J].

Karhila, Reima ;

Remes, Ulpu ;

Kurimo, Mikko .

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2014, 8 (02) :285-295

[29] State duration modeling for HMM-based speech synthesis [J].

Zen, Heiga ;

Masuko, Takashi ;

Tokuda, Keiichi ;

Yoshimura, Takayoshi ;

Kobayasih, Takao ;

Kitamura, Tadashi .

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (03) :692-693

[30] Analysis and HMM-based synthesis of hypo and hyperarticulated speech [J].

Picart, Benjamin ;

Drugman, Thomas ;

Dutoit, Thierry .

COMPUTER SPEECH AND LANGUAGE, 2014, 28 (02) :687-707

← 1 2 3 4 5 →