Speaker Adaptation using Nonlinear Regression Techniques for HMM-based Speech Synthesis

被引:0
|
作者
Hong, Doo Hwa [1 ]
Kang, Shin Jae
Lee, Joun Yeop
Kim, Nam Soo
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 151742, South Korea
来源
2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014) | 2014年
关键词
maximum likelihood linear regression (MLLR); HMM-based speech synthesis; kernel; maximum penalized likelihood kernel regression (MPLKR); LIKELIHOOD LINEAR-REGRESSION; KERNEL REGRESSION;
D O I
10.1109/IIH-MSP.2014.152
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The maximum likelihood linear regression (MLLR) technique is a well-known approach to parameter adaptation in hidden Markov model (HMM)-based systems. In this paper, we propose the maximum penalized likelihood kernel regression (MPLKR) approach as a novel adaptation technique for HMM-based speech synthesis. The proposed algorithm performs a nonlinear regression between the mean vector of the base model and the corresponding mean vector of adaptive data by means of a kernel method. In the experiments, we used various types of parametric kernels for the proposed algorithm and compared their performances with the conventional method. From experimental results, it has been found that the proposed algorithm outperforms the conventional method in terms of the objective measure as well as the subjective listening quality.
引用
收藏
页码:586 / 589
页数:4
相关论文
共 50 条
  • [31] Two-band excitation for HMM-based speech synthesis
    Kim, Sang-Jin
    Hahn, Minsoo
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (01) : 378 - 381
  • [32] FACTOR ANALYZED VOICE MODELS FOR HMM-BASED SPEECH SYNTHESIS
    Kazumi, Kyosuke
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4234 - 4237
  • [33] CONTEXTUAL PARTIAL ADDITIVE STRUCTURE FOR HMM-BASED SPEECH SYNTHESIS
    Takaki, Shinji
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7878 - 7882
  • [34] Extended Decision Tree with OR Relationship for HMM-based Speech Synthesis
    Wang, Yang
    Tao, Jianhua
    Yang, Minghao
    Li, Ya
    2013 SECOND IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR 2013), 2013, : 225 - 229
  • [35] Improved Training of Excitation for HMM-based Parametric Speech Synthesis
    Shiga, Yoshinori
    Toda, Tomoki
    Sakai, Shinsuke
    Kawai, Hisashi
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 809 - 812
  • [36] Evaluation of Prosodic Contextual Factors for HMM-based Speech Synthesis
    Yokomizo, Shuji
    Nose, Takashi
    Kobayashi, Takao
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 430 - 433
  • [37] Implementation and evaluation of an HMM-based Korean speech synthesis system
    Kim, SJ
    Kim, JJ
    Hahn, M
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (03): : 1116 - 1119
  • [38] HMM-based Speaker Characteristics Emphasis Using Average Voice Model
    Nose, Takashi
    Adada, Junichi
    Kobayashi, Takao
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2599 - 2602
  • [39] Prediction method of speech recognition performance based on HMM-based speech synthesis technique
    Terashima R.
    Yoshimura T.
    Wakita T.
    Tokuda K.
    Kitamura T.
    IEEJ Transactions on Electronics, Information and Systems, 2010, 130 (04) : 557 - 564+3
  • [40] Excitation Modeling Based on Waveform Interpolation for HMM-based Speech Synthesis
    Sung, June Sig
    Hong, Doo Hwa
    Oh, Kyung Hwan
    Kim, Nam Soo
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 813 - 816