Structure Aware Multi-Graph Network for Multi-Modal Emotion Recognition in Conversations

被引:1
作者
Zhang, Duzhen [1 ]
Chen, Feilong [1 ]
Chang, Jianlong [1 ]
Chen, Xiuyi [2 ]
Tian, Qi [1 ]
机构
[1] Huawei Technol, Cloud & AI, Shenzhen 518129, Peoples R China
[2] Baidu Inc, Beijing 100085, Peoples R China
关键词
Emotion recognition; Context modeling; Feature extraction; Visualization; Acoustics; Oral communication; Transformers; Structure learning; multi-graph network; dual-stream propagations; multi-modal fusion; emotion recognition in conversations;
D O I
10.1109/TMM.2023.3238314
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-Modal Emotion Recognition in Conversations (MMERC) is an increasingly active research field that leverages multi-modal signals to understand the feelings behind each utterance. Modeling contextual interactions and multi-modal fusion lie at the heart of this field, with graph-based models recently being widely used for MMERC to capture global multi-modal contextual information. However, these models generally mix all modality representations in a single graph, and utterances in each modality are fully connected, potentially ignoring three problems: 1) the heterogeneity of the multi-modal context, 2) the redundancy of contextual information, and 3) over-smoothing of the graph networks. To address these problems, we propose a Structure Aware Multi-Graph Network (SAMGN) for MMERC. Specifically, we construct multiple modality-specific graphs to model the heterogeneity of the multi-modal context. Instead of fully connecting the utterances in each modality, we design a structure learning module that determines whether edges exist between the utterances. This module reduces redundancy by forcing each utterance to focus on the contextual ones that contribute to its emotion recognition, acting like a message propagating reducer to alleviate over-smoothing. Then, we develop the SAMGN via Dual-Stream Propagation (DSP), which contains two propagation streams, i.e., intra- and inter-modal, performed in parallel to aggregate the heterogeneous modality information from multi-graphs. DSP also contains a gating unit that adaptively integrates the co-occurrence information from the above two propagations for emotion recognition. Experiments on two popular MMERC datasets demonstrate that SAMGN achieves new State-Of-The-Art (SOTA) results.
引用
收藏
页码:3987 / 3997
页数:11
相关论文
共 62 条
  • [1] IEMOCAP: interactive emotional dyadic motion capture database
    Busso, Carlos
    Bulut, Murtaza
    Lee, Chi-Chun
    Kazemzadeh, Abe
    Mower, Emily
    Kim, Samuel
    Chang, Jeannette N.
    Lee, Sungbok
    Narayanan, Shrikanth S.
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) : 335 - 359
  • [2] Understanding Emotions in Text Using Deep Learning and Big Data
    Chatterjee, Ankush
    Gupta, Umang
    Chinnakotla, Manoj Kumar
    Srikanth, Radhakrishnan
    Galley, Michel
    Agrawal, Puneet
    [J]. COMPUTERS IN HUMAN BEHAVIOR, 2019, 93 : 309 - 317
  • [3] VLP: A Survey on Vision-language Pre-training
    Chen, Fei-Long
    Zhang, Du-Zhen
    Han, Ming-Lun
    Chen, Xiu-Yi
    Shi, Jing
    Xu, Shuang
    Xu, Bo
    [J]. MACHINE INTELLIGENCE RESEARCH, 2023, 20 (01) : 38 - 56
  • [4] Learning What and When to Drop: Adaptive Multimodal and Contextual Dynamics for Emotion Recognition in Conversation
    Chen, Feiyu
    Sun, Zhengxiao
    Ouyang, Deqiang
    Liu, Xueliang
    Shao, Jie
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1064 - 1073
  • [5] Chen SY, 2018, PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), P1597
  • [6] Cho K., 2014, P 2014 C EMP METH NA, DOI [DOI 10.3115/V1/D14-1179, 10.3115]
  • [7] Where and How to Transfer: Knowledge Aggregation-Induced Transferability Perception for Unsupervised Domain Adaptation
    Dong, Jiahua
    Cong, Yang
    Sun, Gan
    Fang, Zhen
    Ding, Zhengming
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (03) : 1664 - 1681
  • [8] What Can Be Transferred: Unsupervised Domain Adaptation for Endoscopic Lesions Segmentation
    Dong, Jiahua
    Cong, Yang
    Sun, Gan
    Zhong, Bineng
    Xu, Xiaowei
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 4022 - 4031
  • [9] Learning Spatiotemporal Features with 3D Convolutional Networks
    Du Tran
    Bourdev, Lubomir
    Fergus, Rob
    Torresani, Lorenzo
    Paluri, Manohar
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
  • [10] Eyben F., 2010, P ACM INT C MULT, P1459