Significance of incorporating excitation source parameters for improved emotion recognition from speech and electroglottographic signals

被引：20

作者：

Pravena D. ^{[1
]}

Govind D. ^{[1
]}

机构：

[1] Center for Computational Engineering and Networking (CEN), Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Amrita University, Coimbatore

来源：

International Journal of Speech Technology | 2017年 / 20卷 / 04期

关键词：

Emotion recognition; Excitation source parameters; Strength of excitation; Zero frequency filtering;

D O I：

10.1007/s10772-017-9445-x

中图分类号：

学科分类号：

摘要：

The work presented in this paper explores the effectiveness of incorporating the excitation source parameters such as strength of excitation and instantaneous fundamental frequency (F0) for emotion recognition task from speech and electroglottographic (EGG) signals. The strength of excitation (SoE) is an important parameter indicating the pressure with which glottis closes at the glottal closure instants (GCIs). The SoE is computed by the popular zero frequency filtering (ZFF) method which accurately estimates the glottal signal characteristics by attenuating or removing the high frequency vocaltract interactions in speech. The arbitrary impulse sequence, obtained from the estimated GCIs, is used to derive the instantaneous F0. The SoE and the instantaneous F0 parameters are combined with the conventional mel frequency cepstral coefficients (MFCC) to improve the recognition rates of distinct emotions (Anger, Happy and Sad) using Gaussian mixture models as classifier. The performances of the proposed combination of SoE and instantaneous F0 and their dynamic features with MFCC coefficients are compared with the emotion utterances (4 emotions and neutral) from classical German full blown emotion speech database (EmoDb) having simultaneous speech and EGG signals and Surrey Audio Visual Expressed Emotion database (3 emotions and neutral) for both speaker dependent and speaker independent emotion recognition scenarios. To reinforce the effectiveness of the proposed features and for better statistical consistency of the emotion analysis, a fairly large emotion speech database of 220 utterances per emotion in Tamil language with simultaneous EGG recordings, is used in addition to EmoDb. The effectiveness of SoE and instantaneous F0 in characterizing different emotions is also confirmed by the improved emotion recognition performance in Tamil speech-EGG emotion database. © 2017, Springer Science+Business Media, LLC.

引用

页码：787 / 797

页数：10

共 34 条

[1]

Adiga N., Prasanna S.R.M., Significance of instants of significant excitation for source modeling, (2013)

[2]

Ayadi M.E., Kamel M.S., Karray F., Survey on speech emotion recognition: Features, classification schemes and databases, Pattern Recognition, 44, pp. 572-587, (2011)

[3]

Bulut M., Narayanan S., On the robustness of overall f0 only modifications to the perception of emotions in speech, The Journal of the Acoustical Society of America, 123, pp. 4547-4558, (2008)

[4]

Burkhardt F., Paeschke A., Rolfes M., Sendlemeier W., Weiss B., A database of German emotional speech, Proceedings of INTERSPEECH, pp. 1517-1520, (2005)

[5]

Cabral J.P., Oliveira L.C., Emo voice: A system to generate emotions in speech, Proceedings of the INTERSPEECH, pp. 1798-1801, (2006)

[6]

Cahn J.E., Generation of affect in synthesized speech, Proceedings of the American voice I/O society, pp. 1-19, (1989)

[7]

Cerezo E., Baldassarri S., Interactive agents for multimodal emotional user interaction, (2007)

[8]

Creed C., Beal R., Using emotion simulation to influence user attitudes and behaviors, (2005)

[9]

Erickson D., Expressive speech: Production, perception and application to speech synthesis, Acoustical Science and Technology, 26, 4, pp. 317-325, (2005)

[10]

Fairbanks G., Hoaglin L.W., An experimental study of pitch characteristics of voice during the expression of emotion, Speech Monographs, 6, pp. 87-104, (1939)

← 1 2 3 4 →