Evaluation of Speaker Verification Security and Detection of HMM-Based Synthetic Speech

被引：173

作者：

De Leon, Phillip L. ^{[1
]}

Pucher, Michael ^{[2
]}

Yamagishi, Junichi ^{[3
]}

Hernaez, Inma ^{[4
]}

Saratxaga, Ibon ^{[4
]}

机构：

[1] New Mexico State Univ, Klipsch Sch Elect & Comp Engn, Las Cruces, NM 88003 USA

[2] Telecommun Res Ctr Vienna FTW, A-1220 Vienna, Austria

[3] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9AB, Midlothian, Scotland

[4] Univ Basque Country, Bilbao 48013, Spain

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2012年 / 20卷 / 08期

基金：

奥地利科学基金会; 英国工程与自然科学研究理事会;

关键词：

Security; speaker recognition; speech synthesis; NORMALIZATION; ALGORITHMS; IMPOSTOR; SYSTEM;

D O I：

10.1109/TASL.2012.2201472

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we evaluate the vulnerability of speaker verification (SV) systems to synthetic speech. The SV systems are based on either the Gaussian mixture model-universal background model (GMM-UBM) or support vector machine (SVM) using GMM supervectors. We use a hidden Markov model (HMM)-based text-to-speech (TTS) synthesizer, which can synthesize speech for a target speaker using small amounts of training data through model adaptation of an average voice or background model. Although the SV systems have a very low equal error rate (EER), when tested with synthetic speech generated from speaker models derived from the Wall Street Journal (WSJ) speech corpus, over 81% of the matched claims are accepted. This result suggests vulnerability in SV systems and thus a need to accurately detect synthetic speech. We propose a new feature based on relative phase shift (RPS), demonstrate reliable detection of synthetic speech, and show how this classifier can be used to improve security of SV systems.

引用

页码：2280 / 2290

页数：11

共 50 条

[1] A hybrid score measurement for HMM-based speaker verification
Gu, Y
Thomas, T
ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 317 - 320
[2] Speaker interpolation for HMM-based speech synthesis system
Yoshimura, Takayoshi, 2000, Acoustical Soc Jpn, Tokyo, Japan (21):
[3] Speaker adaptation of pitch and spectrum for HMM-based speech synthesis
Tamura, M., 1600, John Wiley and Sons Inc. (35):
[4] Frequency Warping for Speaker Adaptation in HMM-based Speech Synthesis
Gao, Weixun
Cao, Qiying
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2014, 30 (04) : 1149 - 1166
[5] HMM-Based Speaker Emotional Recognition Technology for Speech Signal
Qin, Yuqiang
Zhang, Xueying
FRONTIERS OF MANUFACTURING SCIENCE AND MEASURING TECHNOLOGY, PTS 1-3, 2011, 230-232 : 261 - 265
[6] SPEAKER SIMILARITY EVALUATION OF FOREIGN-ACCENTED SPEECH SYNTHESIS USING HMM-BASED SPEAKER ADAPTATION
Wester, Mirjam
Karhila, Reima
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5372 - 5375
[7] Analysis of speaker clustering strategies for HMM-based speech synthesis
Dall, Rasmus
Veaux, Christophe
Yamagishi, Junichi
King, Simon
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 994 - 997
[8] EVALUATION OF OBJECTIVE MEASURES FOR INTELLIGIBILITY PREDICTION OF HMM-BASED SYNTHETIC SPEECH IN NOISE
Valentini-Botinhao, Cassia
Yamagishi, Junichi
King, Simon
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5112 - 5115
[9] HMM-based integrated method for speaker-independent speech recognition
Tsinghua Univ, Beijing, China
Int Conf Signal Process Proc, (613-616):
[10] An On-line Speaker Adaptation Method for HMM-based Speech Recognizers
Banhalmi, Andras
Kocsor, Andras
ACTA CYBERNETICA, 2008, 18 (03): : 379 - 390

← 1 2 3 4 5 →