Vocal Tract Length Normalization for Statistical Parametric Speech Synthesis

被引：7

作者：

Saheer, Lakshmi ^{[1
,2
]}

Dines, John ^{[1
]}

Garner, Philip N. ^{[1
]}

机构：

[1] Idiap Res Inst, CH-1920 Martigny, Switzerland

[2] Ecole Polytech Fed Lausanne, Lausanne, Switzerland

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2012年 / 20卷 / 07期

关键词：

Expectation-maximization optimization; hidden Markov model (HMM)-based statistical parametric speech synthesis; speaker adaptation; vocal tract length normalization; LINEAR TRANSFORMATION; SPEAKER ADAPTATION; RECOGNITION;

D O I：

10.1109/TASL.2012.2198058

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Vocal tract length normalization (VTLN) has been successfully used in automatic speech recognition for improved performance. The same technique can be implemented in statistical parametric speech synthesis for rapid speaker adaptation during synthesis. This paper presents an efficient implementation of VTLN using expectation maximization and addresses the key challenges faced in implementing VTLN for synthesis. Jacobian normalization, high-dimensionality features and truncation of the transformation matrix are a few challenges presented with the appropriate solutions. Detailed evaluations are performed to estimate the most suitable technique for using VTLN in speech synthesis. Evaluating VTLN in the framework of speech synthesis is also not an easy task since the technique does not work equally well for all speakers. Speakers have been selected based on different objective and subjective criteria to demonstrate the difference between systems. The best method for implementing VTLN is confirmed to be use of the lower order features for estimating warping factors.

引用

页码：2134 / 2148

页数：15

共 50 条

[31] Vocal tract length invariant features for automatic speech recognition
Mertins, A
Rademacher, J
2005 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2005, : 308 - 312
[32] Unattended speech processing: Effect of vocal-tract length
Rivenez, Marie
Darwin, Christopher J.
Bourgeon, Leonore
Guillaume, Anne
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2007, 121 (02): : EL90 - EL95
[33] Statistical parametric speech synthesis for Ibibio
Ekpenyong, Moses
Urua, Eno-Abasi
Watts, Oliver
King, Simon
Yamagishi, Junichi
SPEECH COMMUNICATION, 2014, 56 : 243 - 251
[34] An introduction to statistical parametric speech synthesis
King, Simon
SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2011, 36 (05): : 837 - 852
[35] An introduction to statistical parametric speech synthesis
Simon King
Sadhana, 2011, 36 : 837 - 852
[36] Statistical Parametric Speech Synthesis: A Review
Aroon, Athira
Dhonde, S. B.
PROCEEDINGS OF 2015 IEEE 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO), 2015,
[37] NORMALIZATION OF VOWELS BY VOCAL-TRACT LENGTH AND ITS APPLICATION TO VOWEL IDENTIFICATION
WAKITA, H
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1977, 25 (02): : 183 - 192
[38] A Study on the Influence of Covariance Adaptation on Jacobian Compensation in Vocal Tract Length Normalization
Sanand, D. R.
Rath, S. P.
Umesh, S.
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 544 - 547
[39] Iterative MMSE Estimation of Vocal Tract Length Normalization Factors for Voice Transformation
Erro, Daniel
Navas, Eva
Hernaez, Inma
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 86 - 89
[40] Simulation model of the vocal tract filter for speech synthesis
AlAkaidi, MM
SIMULATION, 1996, 67 (04) : 241 - 246

← 1 2 3 4 5 →