Sentence Selection Based on Extended Entropy Using Phonetic and Prosodic Contexts for Statistical Parametric Speech Synthesis

被引:6
作者
Nose, Takashi [1 ]
Arao, Yusuke [2 ]
Kobayashi, Takao [3 ]
Sugiura, Komei [4 ]
Shiga, Yoshinori [4 ]
机构
[1] Tohoku Univ, Grad Sch Engn, Dept Commun Engn, Sendai, Miyagi 9808579, Japan
[2] Dai Nippon Printing Co Ltd, Tokyo 2240053, Japan
[3] Tokyo Inst Technol, Dept Informat Proc, Interdisciplinary Grad Sch Sci & Engn, Yokohama, Kanagawa 2268502, Japan
[4] Natl Inst Informat & Commun Technol, Kyoto 6190289, Japan
关键词
Corpus design; entropy; hidden Markov model (HMM); prosodic context; sentence selection; statistical parametric speech synthesis; GENERATION ALGORITHM; GLOBAL VARIANCE;
D O I
10.1109/TASLP.2017.2688585
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a sentence selection technique for constructing phonetically and prosodically balanced compact recording scripts for speech synthesis. In the conventional corpus design of speech synthesis, a greedy algorithm that maximizes phonetic coverage is often used. However, for statistical parametric speech synthesis, balances of multiple phonetic and prosodic contextual factors are important as well as the coverage. To take account of both of the phonetic and prosodic contextual balances in sentence selection, we introduce an extended entropy of phonetic and prosodic contexts, such as biphone/triphone, accent/stress/tone, and sentence length. For detailed investigation, conventional and proposed techniques are evaluated using Japanese, English, and Chinese corpora. The objective experimental results show that the proposed technique achieves better coverage and balance of contexts. In addition, speech synthesis experiments based on hidden Markov models reveal that the generated speech parameters become closer to those of the natural speech compared with other conventional sentence selection techniques. Subjective evaluations show that the proposed sentence selection based on the extended entropy improves the naturalness of the synthetic speech while maintaining the similarity to the original sample.
引用
收藏
页码:1107 / 1116
页数:10
相关论文
共 48 条
[1]  
[Anonymous], 2004, 5 ISCA WORKSH SPEECH
[2]  
[Anonymous], 2007, SSW CITESEER
[3]  
[Anonymous], 1999, P EUROSPEECH
[4]  
[Anonymous], 2004, P SSW5
[5]  
[Anonymous], 2011, P PARALINGUISTIC INF, DOI DOI 10.1007/978-1-4614-1335-6_10
[6]  
Bozkurt B., 2003, PROC 8 EUR C SPEECH, P277
[7]  
Chevelu J, 2015, EUR SIGNAL PR CONF, P350, DOI 10.1109/EUSIPCO.2015.7362403
[8]  
Cui XD, 2002, INT CONF ACOUST SPEE, P613
[9]  
Dutoit T., 2008, SPRINGER HDB SPEECH, P437
[10]  
Francois H., 2001, PROC 7 EUR C SPEECH, P829