Structure Aware Multi-Graph Network for Multi-Modal Emotion Recognition in Conversations

被引：1

作者：

Zhang, Duzhen ^{[1
]}

Chen, Feilong ^{[1
]}

Chang, Jianlong ^{[1
]}

Chen, Xiuyi ^{[2
]}

Tian, Qi ^{[1
]}

机构：

[1] Huawei Technol, Cloud & AI, Shenzhen 518129, Peoples R China

[2] Baidu Inc, Beijing 100085, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2024年 / 26卷

关键词：

Emotion recognition; Context modeling; Feature extraction; Visualization; Acoustics; Oral communication; Transformers; Structure learning; multi-graph network; dual-stream propagations; multi-modal fusion; emotion recognition in conversations;

D O I：

10.1109/TMM.2023.3238314

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multi-Modal Emotion Recognition in Conversations (MMERC) is an increasingly active research field that leverages multi-modal signals to understand the feelings behind each utterance. Modeling contextual interactions and multi-modal fusion lie at the heart of this field, with graph-based models recently being widely used for MMERC to capture global multi-modal contextual information. However, these models generally mix all modality representations in a single graph, and utterances in each modality are fully connected, potentially ignoring three problems: 1) the heterogeneity of the multi-modal context, 2) the redundancy of contextual information, and 3) over-smoothing of the graph networks. To address these problems, we propose a Structure Aware Multi-Graph Network (SAMGN) for MMERC. Specifically, we construct multiple modality-specific graphs to model the heterogeneity of the multi-modal context. Instead of fully connecting the utterances in each modality, we design a structure learning module that determines whether edges exist between the utterances. This module reduces redundancy by forcing each utterance to focus on the contextual ones that contribute to its emotion recognition, acting like a message propagating reducer to alleviate over-smoothing. Then, we develop the SAMGN via Dual-Stream Propagation (DSP), which contains two propagation streams, i.e., intra- and inter-modal, performed in parallel to aggregate the heterogeneous modality information from multi-graphs. DSP also contains a gating unit that adaptively integrates the co-occurrence information from the above two propagations for emotion recognition. Experiments on two popular MMERC datasets demonstrate that SAMGN achieves new State-Of-The-Art (SOTA) results.

引用

页码：3987 / 3997

页数：11

共 62 条

[1] IEMOCAP: interactive emotional dyadic motion capture database
Busso, Carlos
Bulut, Murtaza
Lee, Chi-Chun
Kazemzadeh, Abe
Mower, Emily
Kim, Samuel
Chang, Jeannette N.
Lee, Sungbok
Narayanan, Shrikanth S.
[J]. LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) : 335 - 359
[2] Understanding Emotions in Text Using Deep Learning and Big Data
Chatterjee, Ankush
Gupta, Umang
Chinnakotla, Manoj Kumar
Srikanth, Radhakrishnan
Galley, Michel
Agrawal, Puneet
[J]. COMPUTERS IN HUMAN BEHAVIOR, 2019, 93 : 309 - 317
[3] VLP: A Survey on Vision-language Pre-training
Chen, Fei-Long
Zhang, Du-Zhen
Han, Ming-Lun
Chen, Xiu-Yi
Shi, Jing
Xu, Shuang
Xu, Bo
[J]. MACHINE INTELLIGENCE RESEARCH, 2023, 20 (01) : 38 - 56
[4] Learning What and When to Drop: Adaptive Multimodal and Contextual Dynamics for Emotion Recognition in Conversation
Chen, Feiyu
Sun, Zhengxiao
Ouyang, Deqiang
Liu, Xueliang
Shao, Jie
[J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1064 - 1073
[5] Chen SY, 2018, PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), P1597
[6] Cho K., 2014, P 2014 C EMP METH NA, DOI [DOI 10.3115/V1/D14-1179, 10.3115]
[7] Where and How to Transfer: Knowledge Aggregation-Induced Transferability Perception for Unsupervised Domain Adaptation
Dong, Jiahua
Cong, Yang
Sun, Gan
Fang, Zhen
Ding, Zhengming
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (03) : 1664 - 1681
[8] What Can Be Transferred: Unsupervised Domain Adaptation for Endoscopic Lesions Segmentation
Dong, Jiahua
Cong, Yang
Sun, Gan
Zhong, Bineng
Xu, Xiaowei
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4022 - 4031
[9] Learning Spatiotemporal Features with 3D Convolutional Networks
Du Tran
Bourdev, Lubomir
Fergus, Rob
Torresani, Lorenzo
Paluri, Manohar
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
[10] Eyben F., 2010, P ACM INT C MULT, P1459

← 1 2 3 4 5 6 7 →