Towards Efficient Multi-Modal Emotion Recognition

被引:57
作者
Dobrisek, Simon [1 ]
Gajsek, Rok [1 ]
Mihelic, France [1 ]
Pavesic, Nikola [1 ]
Struc, Vitomir [1 ]
机构
[1] Univ Ljubljana, Fac Elect Engn, Ljubljana, Slovenia
关键词
Emotion Recognition; Video Processing; Speech Processing; Canonical Correlations; GMM-UBM;
D O I
10.5772/54002
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
The paper presents a multi-modal emotion recognition system exploiting audio and video (i.e., facial expression) information. The system first processes both sources of information individually to produce corresponding matching scores and then combines the computed matching scores to obtain a classification decision. For the video part of the system, a novel approach to emotion recognition, relying on image-set matching, is developed. The proposed approach avoids the need for detecting and tracking specific facial landmarks throughout the given video sequence, which represents a common source of error in video-based emotion recognition systems, and, therefore, adds robustness to the video processing chain. The audio part of the system, on the other hand, relies on utterance-specific Gaussian Mixture Models (GMMs) adapted from a Universal Background Model (UBM) via the maximum a posteriori probability (MAP) estimation. It improves upon the standard UBM-MAP procedure by exploiting gender information when building the utterance-specific GMMs, thus ensuring enhanced emotion recognition performance. Both the uni-modal parts as well as the combined system are assessed on the challenging multi-modal eNTERFACE'05 corpus with highly encouraging results. The developed system represents a feasible solution to emotion recognition that can easily be integrated into various systems, such as humanoid robots, smart surveillance systems and alike.
引用
收藏
页数:10
相关论文
共 26 条
[1]  
[Anonymous], P ASRU
[2]   Age and gender recognition for telephone applications based on GMM supervectors and support vector machines [J].
Bocklet, Tobias ;
Maier, Andreas ;
Bauer, Josef G. ;
Burkhardt, Felix ;
Noeth, Elmar .
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, :1605-+
[3]   Analysis of Emotionally Salient Aspects of Fundamental Frequency for Emotion Detection [J].
Busso, Carlos ;
Lee, Sungbok ;
Narayanan, Shrikanth .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (04) :582-596
[4]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893
[5]  
Datcu D., 2009, DCI I 2009 PRAG CZEC
[6]  
Eyben P., 2009, PROC IEEE 4 INT HUMA, P576
[7]  
Gajsek Rok, 2010, Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR 2010), P4133, DOI 10.1109/ICPR.2010.1005
[8]  
Gajsek R, 2010, LECT NOTES ARTIF INT, V6231, P275, DOI 10.1007/978-3-642-15760-8_35
[9]  
Huang Su-Yun., 2006, Kernel canonical correlation analysis and its applications to nonlinear measures of association and test of independence
[10]   Score normalization in multimodal biometric systems [J].
Jain, A ;
Nandakumar, K ;
Ross, A .
PATTERN RECOGNITION, 2005, 38 (12) :2270-2285