DEEP MULTIMODAL LEARNING FOR EMOTION RECOGNITION IN SPOKEN LANGUAGE

被引:0
作者
Gu, Yue [1 ]
Chen, Shuhong [1 ]
Marsic, Ivan [1 ]
机构
[1] Rutgers State Univ, Dept Elect & Comp Engn, Piscataway, NJ 08854 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2018年
关键词
Emotion recognition; spoken language; deep multimodal learning; SENTIMENT ANALYSIS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we present a novel deep multimodal framework to predict human emotions based on sentence-level spoken language. Our architecture has two distinctive characteristics. First, it extracts the high-level features from both text and audio via a hybrid deep multimodal structure, which considers the spatial information from text, temporal information from audio, and high-level associations from low-level handcrafted features. Second, we fuse all features by using a three-layer deep neural network to learn the correlations across modalities and train the feature extraction and fusion modules together, allowing optimal global fine-tuning of the entire structure. We evaluated the proposed framework on the IEMOCAP dataset Our result shows promising performance, achieving 60.4% in weighted accuracy for five emotion categories.
引用
收藏
页码:5079 / 5083
页数:5
相关论文
共 20 条
[1]   Convolutional Neural Networks for Speech Recognition [J].
Abdel-Hamid, Ossama ;
Mohamed, Abdel-Rahman ;
Jiang, Hui ;
Deng, Li ;
Penn, Gerald ;
Yu, Dong .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) :1533-1545
[2]  
[Anonymous], P ICDM BARC
[3]  
[Anonymous], ICASSP
[4]  
[Anonymous], 2014, THESIS U WATERLOO
[5]  
[Anonymous], IEEE T CIRCUITS SYST
[6]   IEMOCAP: interactive emotional dyadic motion capture database [J].
Busso, Carlos ;
Bulut, Murtaza ;
Lee, Chi-Chun ;
Kazemzadeh, Abe ;
Mower, Emily ;
Kim, Samuel ;
Chang, Jeannette N. ;
Lee, Sungbok ;
Narayanan, Shrikanth S. .
LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359
[7]  
Cai G., 2015, NAT CCF C NAT LANG P
[8]  
Eyben F., 2010, P 18 ACM INT C MULT, P1459
[9]   Speech Intention Classification with Multimodal Deep Learning [J].
Gu, Yue ;
Li, Xinyu ;
Chen, Shuhong ;
Zhang, Jianyu ;
Marsic, Ivan .
ADVANCES IN ARTIFICIAL INTELLIGENCE, CANADIAN AI 2017, 2017, 10233 :260-271
[10]  
Gu Yue, 2017, 2017 IEEE INT C HEAL