Vocal Tract Length Normalization for Statistical Parametric Speech Synthesis

被引:7
|
作者
Saheer, Lakshmi [1 ,2 ]
Dines, John [1 ]
Garner, Philip N. [1 ]
机构
[1] Idiap Res Inst, CH-1920 Martigny, Switzerland
[2] Ecole Polytech Fed Lausanne, Lausanne, Switzerland
关键词
Expectation-maximization optimization; hidden Markov model (HMM)-based statistical parametric speech synthesis; speaker adaptation; vocal tract length normalization; LINEAR TRANSFORMATION; SPEAKER ADAPTATION; RECOGNITION;
D O I
10.1109/TASL.2012.2198058
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Vocal tract length normalization (VTLN) has been successfully used in automatic speech recognition for improved performance. The same technique can be implemented in statistical parametric speech synthesis for rapid speaker adaptation during synthesis. This paper presents an efficient implementation of VTLN using expectation maximization and addresses the key challenges faced in implementing VTLN for synthesis. Jacobian normalization, high-dimensionality features and truncation of the transformation matrix are a few challenges presented with the appropriate solutions. Detailed evaluations are performed to estimate the most suitable technique for using VTLN in speech synthesis. Evaluating VTLN in the framework of speech synthesis is also not an easy task since the technique does not work equally well for all speakers. Speakers have been selected based on different objective and subjective criteria to demonstrate the difference between systems. The best method for implementing VTLN is confirmed to be use of the lower order features for estimating warping factors.
引用
收藏
页码:2134 / 2148
页数:15
相关论文
共 50 条
  • [1] A parametric approach to vocal tract length normalization
    Eide, E
    Gish, H
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 346 - 348
  • [2] Frequency warping approach for vocal tract length normalization in speech recognition
    Xu, W
    Wang, BX
    Ding, Q
    PROCEEDINGS OF THE THIRD INTERNATIONAL SYMPOSIUM ON INSTRUMENTATION SCIENCE AND TECHNOLOGY, VOL 2, 2004, : 494 - 499
  • [3] Enhancing Vocal Tract Length Normalization with Elastic Registration for Automatic Speech Recognition
    Mueller, Florian
    Mertins, Alfred
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1362 - 1365
  • [4] Comparison of Vocal Tract Length Normalization Technique Applied for Clean and Noisy Speech
    Giurgiu, Mircea
    Kabir, Ahsanul
    2011 34TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2011, : 351 - 354
  • [5] A novel feature transformation for vocal tract length normalization in automatic speech recognition
    Claes, T
    Dologlou, I
    ten Bosch, L
    Van Compernolle, D
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (06): : 549 - 557
  • [6] Time domain vocal tract length normalization
    Sündermann, D
    Bonafonte, A
    Ney, H
    Hoge, H
    Proceedings of the Fourth IEEE International Symposium on Signal Processing and Information Technology, 2004, : 191 - 194
  • [7] Parameter optimization for Vocal Tract Length Normalization
    Dognin, P
    El-Jaroudi, A
    Billa, J
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1767 - 1770
  • [8] Vocal tract length normalization for speaker independent acoustic-to-articulatory speech inversion
    Sivaraman, Ganesh
    Mitra, Vikramjit
    Nam, Hosung
    Tiede, Mark
    Espy-Wilson, Carol
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 455 - 459
  • [9] Vocal tract length normalization using rapid maximum-likelihood estimation for speech recognition
    Emori, Tadashi
    Shinoda, Koichi
    Systems and Computers in Japan, 2002, 33 (05): : 30 - 40
  • [10] An Approach to Vocal Tract Length Normalization by Robust Formant
    Kabir, A.
    Barker, J.
    Giurgiu, M.
    RECENT ADVANCES IN CIRCUITS, SYSTEMS AND SIGNALS, 2010, : 345 - +