Mandarin emotion recognition combining acoustic and emotional point information

被引:18
作者
Chen, Lijiang [1 ]
Mao, Xia [1 ]
Wei, Pengfei [1 ]
Xue, Yuli [1 ]
Ishizuka, Mitsuru [2 ]
机构
[1] Beihang Univ, Sch Elect & Informat Engn, Beijing, Peoples R China
[2] Univ Tokyo, Dept Informat & Commun Engn, Tokyo, Japan
关键词
Mandarin emotion recognition; Emotional point; Fisher rate; Support vector machine; Hidden Markov model; FEATURE VECTOR NORMALIZATION; SPEECH RECOGNITION; PROFILES; FEATURES; SYSTEMS;
D O I
10.1007/s10489-012-0352-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this contribution, we introduce a novel approach to combine acoustic information and emotional point information for a robust automatic recognition of a speaker's emotion. Six discrete emotional states are recognized in the work. Firstly, a multi-level model for emotion recognition by acoustic features is presented. The derived features are selected by fisher rate to distinguish different types of emotions. Secondly, a novel emotional point model for Mandarin is established by Support Vector Machine and Hidden Markov Model. This model contains 28 emotional syllables which reflect rich emotional information. Finally the acoustic information and emotional point information are integrated by a soft decision strategy. Experimental results show that the application of emotional point information in speech emotion recognition is effective.
引用
收藏
页码:602 / 612
页数:11
相关论文
共 43 条
[31]  
Schuller B, 2004, 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS, P577
[32]   Being bored? Recognising natural interest by extensive audiovisual integration for real-life application [J].
Schuller, Bjoern ;
Mueller, Ronald ;
Eyben, Florian ;
Gast, Juergen ;
Hoernler, Benedikt ;
Woellmer, Martin ;
Rigoll, Gerhard ;
Hoethker, Anja ;
Konosu, Hitoshi .
IMAGE AND VISION COMPUTING, 2009, 27 (12) :1760-1774
[33]  
Shami M., 2005, 2005 IEEE INT C MULT, P1
[34]   Acoustic Feature Optimization Based on F-Ratio for Robust Speech Recognition [J].
Sun, Yanqing ;
Zhou, Yu ;
Zhao, Qingwei ;
Yan, Yonghong .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09) :2417-2430
[35]  
Tao JH, 2005, LECT NOTES COMPUT SC, V3784, P981
[36]   Emotional speech recognition: Resources, features, and methods [J].
Ververidis, Dimitrios ;
Kotropoulos, Constantine .
SPEECH COMMUNICATION, 2006, 48 (09) :1162-1181
[37]  
Viikki O, 1998, INT CONF ACOUST SPEE, P733, DOI 10.1109/ICASSP.1998.675369
[38]   Cepstral domain segmental feature vector normalization for noise robust speech recognition [J].
Viikki, O ;
Laurila, K .
SPEECH COMMUNICATION, 1998, 25 (1-3) :133-147
[39]   Multi-stage classification of emotional speech motivated by a dimensional emotion model [J].
Xiao, Zhongzhe ;
Dellandrea, Emmanuel ;
Dou, Weibei ;
Chen, Liming .
MULTIMEDIA TOOLS AND APPLICATIONS, 2010, 46 (01) :119-145
[40]   Emotion recognition from speech signals using new harmony features [J].
Yang, B. ;
Lugger, M. .
SIGNAL PROCESSING, 2010, 90 (05) :1415-1423