Deep Learning for Multimodal Emotion Recognition-Attentive Residual Disconnected RNN

被引:3
作者
Chandra, Erick [1 ]
Hsu, Jane Yung-jen [1 ]
机构
[1] Natl Taiwan Univ, Comp Sci & Informat Engn, Taipei, Taiwan
来源
2019 INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI) | 2019年
关键词
Emotion Recognition; Disconnected Recurrent Neural Network; Attention Mechanism; Residual Network;
D O I
10.1109/taai48200.2019.8959913
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human communicates using verbal and non-verbal cues. One of the most essential elements that complements the understanding of communication is emotion. Emotion is expressed not only in words, but also facial expressions, body language, tone, etc. Therefore, we formulate the emotion recognition as a multimodal task. Emotions are usually described in a sequence along with the utterances. In recent years, RNN-based models have been known to be good at modeling the entire sequence and capturing long-term dependencies. However, it lacks the ability to extract local key patterns and position-invariant features. Hence, we adopt Deep Attentive Residual Disconnected RNN model which incorporates the concept from both RNN and CNN to enhance the ability to capture spatial and temporal features. We utilize CMU MOSEI dataset comprising of language, visual, and acoustic modalities for training and evaluating our model. The results show that Deep Attentive Residual Disconnected RNN model outperforms the baseline. Besides, the use of multimodal approach also solidifies the recognition better compared to those of single modalities.
引用
收藏
页数:8
相关论文
共 15 条
[1]  
[Anonymous], PROC CVPR IEEE
[2]  
[Anonymous], EMOTION PERSPECTIVE
[3]  
[Anonymous], 2005, AFFECTIVE NEUROSCIEN
[4]  
Bahdanau Dzmitry, 2015, CORR
[5]   Multimodal Machine Learning: A Survey and Taxonomy [J].
Baltrusaitis, Tadas ;
Ahuja, Chaitanya ;
Morency, Louis-Philippe .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (02) :423-443
[6]  
Cho K., 2014, P SSST 8 8 WORKSH SY, P103, DOI DOI 10.3115/V1/W14-4012
[7]  
Hochreiter S, 1997, Neural Computation, V9, P1735
[8]  
Runnan Li, 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Proceedings, P6675, DOI 10.1109/ICASSP.2019.8682154
[9]  
Scherer D, 2010, LECT NOTES COMPUT SC, V6354, P92, DOI 10.1007/978-3-642-15825-4_10
[10]  
Vaswani A, 2017, ADV NEUR IN, V30