Evaluation of Speaker Verification Security and Detection of HMM-Based Synthetic Speech

被引：176

作者：

De Leon, Phillip L. ^{[1
]}

Pucher, Michael ^{[2
]}

Yamagishi, Junichi ^{[3
]}

Hernaez, Inma ^{[4
]}

Saratxaga, Ibon ^{[4
]}

机构：

[1] New Mexico State Univ, Klipsch Sch Elect & Comp Engn, Las Cruces, NM 88003 USA

[2] Telecommun Res Ctr Vienna FTW, A-1220 Vienna, Austria

[3] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9AB, Midlothian, Scotland

[4] Univ Basque Country, Bilbao 48013, Spain

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2012年 / 20卷 / 08期

基金：

奥地利科学基金会; 英国工程与自然科学研究理事会;

关键词：

Security; speaker recognition; speech synthesis; NORMALIZATION; ALGORITHMS; IMPOSTOR; SYSTEM;

D O I：

10.1109/TASL.2012.2201472

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we evaluate the vulnerability of speaker verification (SV) systems to synthetic speech. The SV systems are based on either the Gaussian mixture model-universal background model (GMM-UBM) or support vector machine (SVM) using GMM supervectors. We use a hidden Markov model (HMM)-based text-to-speech (TTS) synthesizer, which can synthesize speech for a target speaker using small amounts of training data through model adaptation of an average voice or background model. Although the SV systems have a very low equal error rate (EER), when tested with synthetic speech generated from speaker models derived from the Wall Street Journal (WSJ) speech corpus, over 81% of the matched claims are accepted. This result suggests vulnerability in SV systems and thus a need to accurately detect synthetic speech. We propose a new feature based on relative phase shift (RPS), demonstrate reliable detection of synthetic speech, and show how this classifier can be used to improve security of SV systems.

引用

页码：2280 / 2290

页数：11

共 50 条

[31] Parameterization of Vocal Fry in HMM-Based Speech Synthesis [J].

Silen, Hanna ;

Helander, Elina ;

Nurminen, Jani ;

Gabbouj, Moncef .

INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, :1735-+

[32] A trainable excitation model for HMM-based speech synthesis [J].

Maia, R. ;

Toda, T. ;

Zen, H. ;

Nankaku, Y. ;

Tokuda, K. .

INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, :1125-+

[33] REACTIVE AND CONTINUOUS CONTROL OF HMM-BASED SPEECH SYNTHESIS [J].

Astrinaki, Maria ;

d'Alessandro, Nicolas ;

Picart, Benjamin ;

Drugman, Thomas ;

Dutoit, Thierry .

2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, :252-257

[34] The Design and Implementation of HMM-based Dai Speech Synthesis [J].

Wang, Zhan ;

Yang, Jian ;

Yang, Xin .

2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,

[35] HMM SPEAKER VERIFICATION WITH SPARSE TRAINING DATA ON TELEPHONE QUALITY SPEECH [J].

FORSYTH, ME ;

SUTHERLAND, AM ;

ELLIOTT, JA ;

JACK, MA .

SPEECH COMMUNICATION, 1993, 13 (3-4) :411-416

[36] HMM-based Tibetan Lhasa Speech Synthesis System [J].

Wu Zhiqiang ;

Yu Hongzhi ;

Li Guanyu ;

Wan Shuhui .

2013 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), 2013, :92-95

[37] HMM-based Speaker Characteristics Emphasis Using Average Voice Model [J].

Nose, Takashi ;

Adada, Junichi ;

Kobayashi, Takao .

INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, :2599-2602

[38] Discrete/Continuous Modelling of Speaking Style in HMM-based Speech Synthesis: Design and Evaluation [J].

Obin, Nicolas ;

Lanchantin, Pierre ;

Lacheret, Anne ;

Rodet, Xavier .

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, :2796-+

[39] HMM-BASED SPEECH SYNTHESIS ADAPTATION USING NOISY DATA: ANALYSIS AND EVALUATION METHODS [J].

Karhila, Reima ;

Remes, Ulpu ;

Kurimo, Mikko .

2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, :6930-6934

[40] The integral decode: A smoothing technique for robust HMM-based speaker recognition [J].

Roch, M ;

Hurtig, RR .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (05) :315-324

← 1 2 3 4 5 →