Speaker Adaptation using Nonlinear Regression Techniques for HMM-based Speech Synthesis

被引：0

作者：

Hong, Doo Hwa ^{[1
]}

Kang, Shin Jae

Lee, Joun Yeop

Kim, Nam Soo

机构：

[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 151742, South Korea

来源：

2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014) | 2014年

关键词：

maximum likelihood linear regression (MLLR); HMM-based speech synthesis; kernel; maximum penalized likelihood kernel regression (MPLKR); LIKELIHOOD LINEAR-REGRESSION; KERNEL REGRESSION;

D O I：

10.1109/IIH-MSP.2014.152

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The maximum likelihood linear regression (MLLR) technique is a well-known approach to parameter adaptation in hidden Markov model (HMM)-based systems. In this paper, we propose the maximum penalized likelihood kernel regression (MPLKR) approach as a novel adaptation technique for HMM-based speech synthesis. The proposed algorithm performs a nonlinear regression between the mean vector of the base model and the corresponding mean vector of adaptive data by means of a kernel method. In the experiments, we used various types of parametric kernels for the proposed algorithm and compared their performances with the conventional method. From experimental results, it has been found that the proposed algorithm outperforms the conventional method in terms of the objective measure as well as the subjective listening quality.

引用

页码：586 / 589

页数：4

共 50 条

[31] Two-band excitation for HMM-based speech synthesis
Kim, Sang-Jin
Hahn, Minsoo
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (01) : 378 - 381
[32] FACTOR ANALYZED VOICE MODELS FOR HMM-BASED SPEECH SYNTHESIS
Kazumi, Kyosuke
Nankaku, Yoshihiko
Tokuda, Keiichi
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4234 - 4237
[33] CONTEXTUAL PARTIAL ADDITIVE STRUCTURE FOR HMM-BASED SPEECH SYNTHESIS
Takaki, Shinji
Nankaku, Yoshihiko
Tokuda, Keiichi
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7878 - 7882
[34] Extended Decision Tree with OR Relationship for HMM-based Speech Synthesis
Wang, Yang
Tao, Jianhua
Yang, Minghao
Li, Ya
2013 SECOND IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR 2013), 2013, : 225 - 229
[35] Improved Training of Excitation for HMM-based Parametric Speech Synthesis
Shiga, Yoshinori
Toda, Tomoki
Sakai, Shinsuke
Kawai, Hisashi
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 809 - 812
[36] Evaluation of Prosodic Contextual Factors for HMM-based Speech Synthesis
Yokomizo, Shuji
Nose, Takashi
Kobayashi, Takao
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 430 - 433
[37] Implementation and evaluation of an HMM-based Korean speech synthesis system
Kim, SJ
Kim, JJ
Hahn, M
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (03): : 1116 - 1119
[38] HMM-based Speaker Characteristics Emphasis Using Average Voice Model
Nose, Takashi
Adada, Junichi
Kobayashi, Takao
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2599 - 2602
[39] Prediction method of speech recognition performance based on HMM-based speech synthesis technique
Terashima R.
Yoshimura T.
Wakita T.
Tokuda K.
Kitamura T.
IEEJ Transactions on Electronics, Information and Systems, 2010, 130 (04) : 557 - 564+3
[40] Excitation Modeling Based on Waveform Interpolation for HMM-based Speech Synthesis
Sung, June Sig
Hong, Doo Hwa
Oh, Kyung Hwan
Kim, Nam Soo
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 813 - 816

← 1 2 3 4 5 →