Analysis on the Importance of Short-Term Speech Parameterizations for Emotional Statistical Parametric Speech Synthesis

被引:0
作者
Maia, Ranniery [1 ]
Akamine, Masami [1 ]
机构
[1] Toshiba Res Europe Ltd, Cambridge Res Lab, Cambridge, England
来源
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 | 2012年
关键词
speech synthesis; statistical parametric speech synthesis; expressive speech synthesis;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a study on the importance of short-term spectral and excitation parameterizations for emotional hidden Markov model (HMM)-based speech synthesis. The analysis is performed through an emotion classification task by using two methods: K-means emotion clustering and Gaussian Mixture Models (GMMs)-based emotion identification. Two known forms of parameterization for the short-term speech spectral envelope, the mel-cepstrum and the mel- line spectrum pairs are utilized while features derived from the complex cepstrum and group delay, and band-aperiodicity coefficients are used as excitation parameters. The emotion-dependent features according to the classification performance are then selected to train emotion-dependent HMM-based synthesizers. Listening tests are performed to verify the impact of the parameters on the similarity of the synthesized speech with its natural version.
引用
收藏
页码:1630 / 1633
页数:4
相关论文
共 16 条
[1]   Emotions in vowel segments of continuous speech: Analysis of the glottal flow using the normalised amplitude quotient [J].
Airas, M ;
Alku, P .
PHONETICA, 2006, 63 (01) :26-46
[2]   Normalized amplitude quotient for parametrization of the glottal flow [J].
Alku, P ;
Bäckström, T ;
Vilkman, E .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2002, 112 (02) :701-710
[3]  
[Anonymous], 2007, P INTERSPEECH
[4]  
Banno H, 1998, INT CONF ACOUST SPEE, P861, DOI 10.1109/ICASSP.1998.675401
[5]   Normalized Mutual Information Feature Selection [J].
Estevez, Pablo. A. ;
Tesmer, Michel ;
Perez, Claudio A. ;
Zurada, Jacek A. .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2009, 20 (02) :189-201
[6]  
Fant G., 1985, STL QPSR, V26
[7]  
Kawahara Hideki., 2001, Proc. of MAVEBA, P13
[8]  
Maia R, 2012, INT CONF ACOUST SPEE, P4581, DOI 10.1109/ICASSP.2012.6288938
[9]   Modeling of the glottal flow derivative waveform with application to speaker identification [J].
Plumpe, MD ;
Quatieri, TF ;
Reynolds, DA .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (05) :569-586
[10]  
Pribilová A, 2009, LECT NOTES ARTIF INT, V5398, P232