Evaluation of Speaker Verification Security and Detection of HMM-Based Synthetic Speech

被引:173
作者
De Leon, Phillip L. [1 ]
Pucher, Michael [2 ]
Yamagishi, Junichi [3 ]
Hernaez, Inma [4 ]
Saratxaga, Ibon [4 ]
机构
[1] New Mexico State Univ, Klipsch Sch Elect & Comp Engn, Las Cruces, NM 88003 USA
[2] Telecommun Res Ctr Vienna FTW, A-1220 Vienna, Austria
[3] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9AB, Midlothian, Scotland
[4] Univ Basque Country, Bilbao 48013, Spain
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2012年 / 20卷 / 08期
基金
奥地利科学基金会; 英国工程与自然科学研究理事会;
关键词
Security; speaker recognition; speech synthesis; NORMALIZATION; ALGORITHMS; IMPOSTOR; SYSTEM;
D O I
10.1109/TASL.2012.2201472
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we evaluate the vulnerability of speaker verification (SV) systems to synthetic speech. The SV systems are based on either the Gaussian mixture model-universal background model (GMM-UBM) or support vector machine (SVM) using GMM supervectors. We use a hidden Markov model (HMM)-based text-to-speech (TTS) synthesizer, which can synthesize speech for a target speaker using small amounts of training data through model adaptation of an average voice or background model. Although the SV systems have a very low equal error rate (EER), when tested with synthetic speech generated from speaker models derived from the Wall Street Journal (WSJ) speech corpus, over 81% of the matched claims are accepted. This result suggests vulnerability in SV systems and thus a need to accurately detect synthetic speech. We propose a new feature based on relative phase shift (RPS), demonstrate reliable detection of synthetic speech, and show how this classifier can be used to improve security of SV systems.
引用
收藏
页码:2280 / 2290
页数:11
相关论文
共 50 条
[21]   HMM-Based Speech Synthesis for the Greek Language [J].
Karabetsos, Sotiris ;
Tsiakoulis, Pirros ;
Chalamandaris, Aimilios ;
Raptis, Spyros .
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2008, 5246 :349-356
[22]   An HMM-based Cantonese Speech Synthesis System [J].
Wang, Xin ;
Wu, Zhiyong .
2012 IEEE GLOBAL HIGH TECH CONGRESS ON ELECTRONICS (GHTCE), 2012,
[23]   Unsupervised adaptation for HMM-based speech synthesis [J].
King, Simon ;
Tokuda, Keiichi ;
Zen, Heiga ;
Yamagishi, Junichi .
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, :1869-+
[24]   Thousands of Voices for HMM-based Speech Synthesis [J].
Yamagishi, Junichi ;
Usabaev, Bela ;
King, Simon ;
Watts, Oliver ;
Dines, John ;
Tian, Jilei ;
Hu, Rile ;
Guan, Yong ;
Oura, Keiichiro ;
Tokuda, Keiichi ;
Karhila, Reima ;
Kurimo, Mikko .
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, :416-+
[25]   Analysis of HMM-Based Lombard Speech Synthesis [J].
Raitio, Tuomo ;
Suni, Antti ;
Vainio, Martti ;
Alku, Paavo .
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, :2792-+
[26]   Performance evaluation of the speaker-independent HMM-based speech synthesis system "HTS-2007" for the Blizzard Challenge 2007 [J].
Yamagishi, Junichi ;
Nose, Takashi ;
Zen, Heiga ;
Toda, Tomoki ;
Tokuda, Keiichi .
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, :3957-+
[27]   A COMPARISON OF SUPERVISED AND UNSUPERVISED CROSS-LINGUAL SPEAKER ADAPTATION APPROACHES FOR HMM-BASED SPEECH SYNTHESIS [J].
Liang, Hui ;
Dines, John ;
Saheer, Lakshmi .
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, :4598-4601
[28]   Noise in HMM-Based Speech Synthesis Adaptation: Analysis, Evaluation Methods and Experiments [J].
Karhila, Reima ;
Remes, Ulpu ;
Kurimo, Mikko .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2014, 8 (02) :285-295
[29]   State duration modeling for HMM-based speech synthesis [J].
Zen, Heiga ;
Masuko, Takashi ;
Tokuda, Keiichi ;
Yoshimura, Takayoshi ;
Kobayasih, Takao ;
Kitamura, Tadashi .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (03) :692-693
[30]   Analysis and HMM-based synthesis of hypo and hyperarticulated speech [J].
Picart, Benjamin ;
Drugman, Thomas ;
Dutoit, Thierry .
COMPUTER SPEECH AND LANGUAGE, 2014, 28 (02) :687-707