Analysis on the Importance of Short-Term Speech Parameterizations for Emotional Statistical Parametric Speech Synthesis

被引：0

作者：

Maia, Ranniery ^{[1
]}

Akamine, Masami ^{[1
]}

机构：

[1] Toshiba Res Europe Ltd, Cambridge Res Lab, Cambridge, England

来源：

13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3 | 2012年

关键词：

speech synthesis; statistical parametric speech synthesis; expressive speech synthesis;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a study on the importance of short-term spectral and excitation parameterizations for emotional hidden Markov model (HMM)-based speech synthesis. The analysis is performed through an emotion classification task by using two methods: K-means emotion clustering and Gaussian Mixture Models (GMMs)-based emotion identification. Two known forms of parameterization for the short-term speech spectral envelope, the mel-cepstrum and the mel- line spectrum pairs are utilized while features derived from the complex cepstrum and group delay, and band-aperiodicity coefficients are used as excitation parameters. The emotion-dependent features according to the classification performance are then selected to train emotion-dependent HMM-based synthesizers. Listening tests are performed to verify the impact of the parameters on the similarity of the synthesized speech with its natural version.

引用

页码：1630 / 1633

页数：4

共 16 条

[1] Emotions in vowel segments of continuous speech: Analysis of the glottal flow using the normalised amplitude quotient [J].

Airas, M ;

Alku, P .

PHONETICA, 2006, 63 (01) :26-46

[2] Normalized amplitude quotient for parametrization of the glottal flow [J].

Alku, P ;

Bäckström, T ;

Vilkman, E .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2002, 112 (02) :701-710

[3]

[Anonymous], 2007, P INTERSPEECH

[4]

Banno H, 1998, INT CONF ACOUST SPEE, P861, DOI 10.1109/ICASSP.1998.675401

[5] Normalized Mutual Information Feature Selection [J].

Estevez, Pablo. A. ;

Tesmer, Michel ;

Perez, Claudio A. ;

Zurada, Jacek A. .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 2009, 20 (02) :189-201

[6]

Fant G., 1985, STL QPSR, V26

[7]

Kawahara Hideki., 2001, Proc. of MAVEBA, P13

[8]

Maia R, 2012, INT CONF ACOUST SPEE, P4581, DOI 10.1109/ICASSP.2012.6288938

[9] Modeling of the glottal flow derivative waveform with application to speaker identification [J].

Plumpe, MD ;

Quatieri, TF ;

Reynolds, DA .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (05) :569-586

[10]

Pribilová A, 2009, LECT NOTES ARTIF INT, V5398, P232

← 1 2 →