Topics Guided Multimodal Fusion Network for Conversational Emotion Recognition

被引:0
作者
Yuan, Peicong [1 ]
Cai, Guoyong [1 ]
Chen, Ming [1 ]
Tang, Xiaolv [1 ]
机构
[1] Guilin Univ Elect Technol, Guilin, Peoples R China
来源
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT III, ICIC 2024 | 2024年 / 14877卷
关键词
Emotion Recognition in Conversation; Neural Topic Model; Multimodal Fusion;
D O I
10.1007/978-981-97-5669-8_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotion Recognition in Conversation (ERC) is a very challenging task. Previous methods capture the semantic dependencies between utterances through complex conversational context modeling, ignoring the impact of the topic information contained in the utterances; furthermore, the commonality of multimodal information has not been effectively explored. To this end, the Topics Guided Multimodal Fusion Network (TGMFN) is proposed to extract effective utterance topic information and explore cross-modal commonality and complementarity to improve model performance. First, the VAE-based neural topic model is used to build a conversational topic model, and a new topic sampling strategy is designed that is different from the traditional reparameterization trick so that the topic modeling is more suitable for utterances. Second, a facial feature extraction method in multi-party conversations is proposed to extract rich facial features in the video. Finally, the Topic-Guided Vision-Audio features Aware fusion (TGV2A) module is designed based on the conversation topic, which fully fuses modal information such as the speaker's facial feature and topic-related co-occurrence information, and captures the commonality and complementarity between multimodal information to improve feature-semantic richness. Extensive experiments have been conducted on two multimodal ERC datasets IEMOCAP and MELD. Experimental results indicate that the proposed TGMFN model shows superior performance over the leading baseline methods.
引用
收藏
页码:250 / 262
页数:13
相关论文
共 27 条
  • [1] Bao Y., 2022, arXiv
  • [2] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [3] IEMOCAP: interactive emotional dyadic motion capture database
    Busso, Carlos
    Bulut, Murtaza
    Lee, Chi-Chun
    Kazemzadeh, Abe
    Mower, Emily
    Kim, Samuel
    Chang, Jeannette N.
    Lee, Sungbok
    Narayanan, Shrikanth S.
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) : 335 - 359
  • [4] Card D, 2018, Arxiv, DOI [arXiv:1705.09296, 10.48550/arXiv.1705.09296, DOI 10.48550/ARXIV.1705.09296]
  • [5] Topic Modeling in Embedding Spaces
    Dieng, Adji B.
    Ruiz, Francisco J. R.
    Blei, David M.
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2020, 8 (439-453) : 439 - 453
  • [6] Fusing pairwise modalities for emotion recognition in conversations
    Fan, Chunxiao
    Lin, Jie
    Mao, Rui
    Cambria, Erik
    [J]. INFORMATION FUSION, 2024, 106
  • [7] Context reinforced neural topic modeling over short texts
    Feng, Jiachun
    Zhang, Zusheng
    Ding, Cheng
    Rao, Yanghui
    Xie, Haoran
    Wang, Fu Lee
    [J]. INFORMATION SCIENCES, 2022, 607 : 79 - 91
  • [8] Ghosal D, 2020, Arxiv, DOI arXiv:2010.02795
  • [9] Ghosal D, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P154
  • [10] Hazarika D, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P2594