Context-Sensitive Learning for Enhanced Audiovisual Emotion Classification

被引:115
作者
Metallinou, Angeliki [1 ]
Woellmer, Martin [2 ]
Katsamanis, Athanasios [1 ]
Eyben, Florian [2 ]
Schuller, Bjoern [2 ]
Narayanan, Shrikanth [1 ]
机构
[1] Univ So Calif, Dept Elect Engn, Los Angeles, CA 90089 USA
[2] Tech Univ Munich, Inst Human Machine Commun, D-80333 Munich, Germany
关键词
Audio-visual emotion recognition; temporal context; Hidden Markov models; bidirectional long short term memory; recurrent neural networks; emotional grammars; FACIAL EXPRESSIONS; PERCEPTION; FACE;
D O I
10.1109/T-AFFC.2011.40
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human emotional expression tends to evolve in a structured manner in the sense that certain emotional evolution patterns, i.e., anger to anger, are more probable than others, e. g., anger to happiness. Furthermore, the perception of an emotional display can be affected by recent emotional displays. Therefore, the emotional content of past and future observations could offer relevant temporal context when classifying the emotional content of an observation. In this work, we focus on audio-visual recognition of the emotional content of improvised emotional interactions at the utterance level. We examine context-sensitive schemes for emotion recognition within a multimodal, hierarchical approach: bidirectional Long Short-Term Memory (BLSTM) neural networks, hierarchical Hidden Markov Model classifiers (HMMs), and hybrid HMM/BLSTM classifiers are considered for modeling emotion evolution within an utterance and between utterances over the course of a dialog. Overall, our experimental results indicate that incorporating long-term temporal context is beneficial for emotion recognition systems that encounter a variety of emotional manifestations. Context-sensitive approaches outperform those without context for classification tasks such as discrimination between valence levels or between clusters in the valence-activation space. The analysis of emotional transitions in our database sheds light into the flow of affective expressions, revealing potentially useful patterns.
引用
收藏
页码:184 / 198
页数:15
相关论文
共 50 条
[1]  
[Anonymous], 2009, P 10 ANN C INT SPEEC
[2]  
[Anonymous], 1997, Neural Computation, DOI 10.1109/tpami.2013.50
[3]  
[Anonymous], J COMPUTATIONAL INTE
[4]  
[Anonymous], 2000, P NEUR INF PROC SYST
[5]  
[Anonymous], ADV NEURAL INF PROCE
[6]  
[Anonymous], 1997, OBSERVING INTERACTIO
[7]  
Bakeman R., 1995, Analyzing interaction: Sequential analysis with SDIS and GSEQ
[8]  
Banziger T., 2007, P 2 INT C AFF COMP I
[9]  
Boersma P., 2013, Praat: doing phonetics by computer, DOI DOI 10.1097/AUD.0B013E31821473F7
[10]  
Brand M., 1997, P IEEE CS C VIS PATT