MM-DFN: MULTIMODAL DYNAMIC FUSION NETWORK FOR EMOTION RECOGNITION IN CONVERSATIONS

被引:105
作者
Hu, Dou [1 ]
Hou, Xiaolong [1 ]
Wei, Lingwei [2 ]
Jiang, Lianxin [1 ]
Mo, Yang [1 ]
机构
[1] Ping An Life Insurance Co China Ltd, Shenzhen, Peoples R China
[2] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年
关键词
emotion recognition; emotion recognition in conversations; multimodal fusion; dialogue systems;
D O I
10.1109/ICASSP43922.2022.9747397
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Emotion Recognition in Conversations (ERC) has considerable prospects for developing empathetic machines. For multimodal ERC, it is vital to understand context and fuse modality information in conversations. Recent graph-based fusion methods generally aggregate multimodal information by exploring unimodal and cross-modal interactions in a graph. However, they accumulate redundant information at each layer, limiting the context understanding between modalities. In this paper, we propose a novel Multimodal Dynamic Fusion Network (MM-DFN) to recognize emotions by fully understanding multimodal conversational context. Specifically, we design a new graph-based dynamic fusion module to fuse multimodal context features in a conversation. The module reduces redundancy and enhances complementarity between modalities by capturing the dynamics of contextual information in different semantic spaces. Extensive experiments on two public benchmark datasets demonstrate the effectiveness and superiority of the proposed model.
引用
收藏
页码:7037 / 7041
页数:5
相关论文
共 24 条
[1]   IEMOCAP: interactive emotional dyadic motion capture database [J].
Busso, Carlos ;
Bulut, Murtaza ;
Lee, Chi-Chun ;
Kazemzadeh, Abe ;
Mower, Emily ;
Kim, Samuel ;
Chang, Jeannette N. ;
Lee, Sungbok ;
Narayanan, Shrikanth S. .
LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359
[2]   Learning What and When to Drop: Adaptive Multimodal and Contextual Dynamics for Emotion Recognition in Conversation [J].
Chen, Feiyu ;
Sun, Zhengxiao ;
Ouyang, Deqiang ;
Liu, Xueliang ;
Shao, Jie .
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, :1064-1073
[3]  
Chen M., 2020, P INT C MACH LEARN, P1725
[4]   Forming-free and Annealing-free V/VOx/HfWOx/Pt Device Exhibiting Reconfigurable Threshold and Resistive switching with high speed (<30ns) and high endurance (>1012/>1010) [J].
Fu, Yaoyao ;
Zhou, Yue ;
Huang, Xiaodi ;
Gao, Bin ;
He, Yuhui ;
Li, Yi ;
Chai, Yang ;
Miao, Xiangshui .
2021 IEEE INTERNATIONAL ELECTRON DEVICES MEETING (IEDM), 2021,
[5]  
Ghosal D, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P154
[6]  
Hazarika D, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P2594
[7]  
Hazarika Devamanyu, 2018, Proc Conf, V2018, P2122, DOI 10.18653/v1/n18-1193
[8]  
Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.8.1735, 10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
[9]  
Hu D., 2021, Long Papers, V1, P7042
[10]  
Hu J., 2021, P ANN M ASS COMP LIN, P5666, DOI DOI 10.18653/V1/2021.ACL-LONG.440