Vocal Tract Length Normalization for Statistical Parametric Speech Synthesis

被引:7
|
作者
Saheer, Lakshmi [1 ,2 ]
Dines, John [1 ]
Garner, Philip N. [1 ]
机构
[1] Idiap Res Inst, CH-1920 Martigny, Switzerland
[2] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
关键词
Expectation-maximization optimization; hidden Markov model (HMM)-based statistical parametric speech synthesis; speaker adaptation; vocal tract length normalization; LINEAR TRANSFORMATION; SPEAKER ADAPTATION; RECOGNITION;
D O I
10.1109/TASL.2012.2198058
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Vocal tract length normalization (VTLN) has been successfully used in automatic speech recognition for improved performance. The same technique can be implemented in statistical parametric speech synthesis for rapid speaker adaptation during synthesis. This paper presents an efficient implementation of VTLN using expectation maximization and addresses the key challenges faced in implementing VTLN for synthesis. Jacobian normalization, high-dimensionality features and truncation of the transformation matrix are a few challenges presented with the appropriate solutions. Detailed evaluations are performed to estimate the most suitable technique for using VTLN in speech synthesis. Evaluating VTLN in the framework of speech synthesis is also not an easy task since the technique does not work equally well for all speakers. Speakers have been selected based on different objective and subjective criteria to demonstrate the difference between systems. The best method for implementing VTLN is confirmed to be use of the lower order features for estimating warping factors.
引用
收藏
页码:2134 / 2148
页数:15
相关论文
共 50 条
  • [31] Vocal tract length invariant features for automatic speech recognition
    Mertins, A
    Rademacher, J
    2005 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2005, : 308 - 312
  • [32] Unattended speech processing: Effect of vocal-tract length
    Rivenez, Marie
    Darwin, Christopher J.
    Bourgeon, Leonore
    Guillaume, Anne
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2007, 121 (02): : EL90 - EL95
  • [33] Statistical parametric speech synthesis for Ibibio
    Ekpenyong, Moses
    Urua, Eno-Abasi
    Watts, Oliver
    King, Simon
    Yamagishi, Junichi
    SPEECH COMMUNICATION, 2014, 56 : 243 - 251
  • [34] An introduction to statistical parametric speech synthesis
    King, Simon
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2011, 36 (05): : 837 - 852
  • [35] An introduction to statistical parametric speech synthesis
    Simon King
    Sadhana, 2011, 36 : 837 - 852
  • [36] Statistical Parametric Speech Synthesis: A Review
    Aroon, Athira
    Dhonde, S. B.
    PROCEEDINGS OF 2015 IEEE 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO), 2015,
  • [37] NORMALIZATION OF VOWELS BY VOCAL-TRACT LENGTH AND ITS APPLICATION TO VOWEL IDENTIFICATION
    WAKITA, H
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1977, 25 (02): : 183 - 192
  • [38] A Study on the Influence of Covariance Adaptation on Jacobian Compensation in Vocal Tract Length Normalization
    Sanand, D. R.
    Rath, S. P.
    Umesh, S.
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 544 - 547
  • [39] Iterative MMSE Estimation of Vocal Tract Length Normalization Factors for Voice Transformation
    Erro, Daniel
    Navas, Eva
    Hernaez, Inma
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 86 - 89
  • [40] Simulation model of the vocal tract filter for speech synthesis
    AlAkaidi, MM
    SIMULATION, 1996, 67 (04) : 241 - 246