Adaptation of Prosody in Speech Synthesis by Changing Command Values of the Generation Process Model of Fundamental Frequency

被引:0
作者
Hirose, Keikichi [1 ]
Ochi, Keiko [1 ]
Mihara, Ryusuke [1 ]
Hashimoto, Hiroya [1 ]
Saito, Daisuke [1 ]
Minematsu, Nobuaki [1 ]
机构
[1] Univ Tokyo, Dept Informat & Commun Engn, Tokyo, Japan
来源
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 | 2011年
关键词
prosody adaptation; generation process model; speech synthesis; PARAMETERS; CONTOURS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A method was developed to adapt prosody to a new speaker/style in speech synthesis. It is based on predicting differences between target and original speakers/styles and applying them to the original one. Differences in fundamental frequency (F-0) contours are represented in the framework of the generation process model; differences in the command magnitudes/amplitudes. While the original one requires a certain amount of training corpus, while corpus for training command differences can be small. Furthermore, in the case of style adaptation, it is not necessarily the corpus being uttered by the same speaker of the original style. Speech synthesis was conducted using HMM-based speech synthesis system, where prosody was controlled by the method. Listening experiments on synthetic speech with style adaptation and voice conversion both showed the validity of the method.
引用
收藏
页码:2804 / +
页数:2
相关论文
共 9 条
  • [1] Fujisaka H., 1984, Journal of the Acoustical Society of Japan (E), V5, P233, DOI 10.1250/ast.5.233
  • [2] Synthesis of F0 contours using generation process model parameters predicted from unlabeled corpora:: application to emotional speech synthesis
    Hirose, K
    Sato, K
    Asano, Y
    Minematsu, N
    [J]. SPEECH COMMUNICATION, 2005, 46 (3-4) : 385 - 404
  • [3] Hirose K., 2007, P INT, P1274
  • [4] Kain A., 2002, P IEEE ICASSP, P285
  • [5] Matsuda T., 2010, J SIGNAL PROCESSING, V14, P277
  • [6] Narusawa S, 2002, INT CONF ACOUST SPEE, P509
  • [7] Ochi K., 2009, P IEEE ICASSP, P4485
  • [8] Ochi K, 2008, INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, P1216
  • [9] Hidden Markov models based on multi-space probability distribution for pitch pattern modeling
    Tokuda, K
    Masuko, T
    Miyazaki, N
    Kobayashi, T
    [J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 229 - 232