Context- and Knowledge-Aware Graph Convolutional Network for Multimodal Emotion Recognition

被引:15
作者
Fu, Yahui [1 ]
Okada, Shogo [1 ]
Wang, Longbiao [2 ]
Guo, Lili [2 ]
Song, Yaodong [2 ]
Liu, Jiaxing [2 ]
Dang, Jianwu [2 ]
机构
[1] Japan Adv Inst Sci & Technol, Sch Informat Sci, Nomi, Ishikawa 9231211, Japan
[2] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin 300072, Peoples R China
关键词
Emotion recognition; Context modeling; Semantics; Oral communication; Knowledge based systems; Task analysis; Social networking (online);
D O I
10.1109/MMUL.2022.3173430
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This work proposes an approach for emotion recognition in conversation that leverages context modeling, knowledge enrichment, and multimodal (text and audio) learning based on a graph convolutional network (GCN). We first construct two distinctive graphs for modeling the contextual interaction and knowledge dynamic. We then introduce an affective lexicon into knowledge graph building to enrich the emotional polarity of each concept, that is the related knowledge of each token in an utterance. Then, we achieve a balance between the context and the affect-enriched knowledge by incorporating them into the new adjacency matrix construction of the GCN architecture, and teach them jointly with multiple modalities to effectively structure the semantics-sensitive and knowledge-sensitive contextual dependence of each conversation. Our model outperforms the state-of-the-art benchmarks by over 22.6% and 11% relative error reduction in terms of weighted-F1 on the IEMOCAP and MELD databases, respectively, demonstrating the superiority of our method in emotion recognition.
引用
收藏
页码:91 / 99
页数:9
相关论文
共 20 条
[1]  
Bojanowski P., 2017, T ASSOC COMPUT LING, V5, P135, DOI DOI 10.1162/TACL_A_00051
[2]   IEMOCAP: interactive emotional dyadic motion capture database [J].
Busso, Carlos ;
Bulut, Murtaza ;
Lee, Chi-Chun ;
Kazemzadeh, Abe ;
Mower, Emily ;
Kim, Samuel ;
Chang, Jeannette N. ;
Lee, Sungbok ;
Narayanan, Shrikanth S. .
LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359
[3]  
Devlin J., 2018, CORR
[4]   Forming-free and Annealing-free V/VOx/HfWOx/Pt Device Exhibiting Reconfigurable Threshold and Resistive switching with high speed (<30ns) and high endurance (>1012/>1010) [J].
Fu, Yaoyao ;
Zhou, Yue ;
Huang, Xiaodi ;
Gao, Bin ;
He, Yuhui ;
Li, Yi ;
Chai, Yang ;
Miao, Xiangshui .
2021 IEEE INTERNATIONAL ELECTRON DEVICES MEETING (IEDM), 2021,
[5]  
Ghosal D, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P154
[6]  
Guo LL, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P2666, DOI 10.1109/ICASSP.2018.8462219
[7]  
Kim Yoon, 2014, P 2014 C EMP METH NA
[8]   Emotion Reinforced Visual Storytelling [J].
Li, Nanxing ;
Liu, Bei ;
Han, Zhizhong ;
Liu, Yu-Shen ;
Fu, Jianlong .
ICMR'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2019, :297-305
[9]  
Majumder N, 2019, AAAI CONF ARTIF INTE, P6818
[10]   EmotiCon: Context-Aware Multimodal Emotion Recognition using Frege's Principle [J].
Mittal, Trisha ;
Guhan, Pooja ;
Bhattacharya, Uttaran ;
Chandra, Rohan ;
Bera, Aniket ;
Manocha, Dinesh .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :14222-14231