Speaker-aware cognitive network with cross-modal attention for multimodal emotion recognition in conversation

被引:0
作者
Guo, Lili [1 ,2 ]
Song, Yikang [1 ]
Ding, Shifei [1 ,2 ]
机构
[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou 221116, Peoples R China
[2] Minist Educ Peoples Republ China, Mine Digitizat Engn Res Ctr, Xuzhou 221116, Peoples R China
关键词
Speaker-aware; Emotion recognition in conversation; Cross-modal attention; MODEL;
D O I
10.1016/j.knosys.2024.111969
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotion recognition in conversation (ERC) has gained considerable attention owing to its extensive applications in the field of human-computer interaction. However, previous models have had certain limitations in exploring the potential emotional relationships within the conversation due to their inability to fully leverage speaker information. Additionally, information from various modalities such as text, audio, and video can synergistically enhance and supplement the analysis of emotional context within the conversation. Nonetheless, effectively fusing multimodal features to understand the detailed contextual information in the conversation is challenging. This paper proposes a Speaker-Aware Cognitive network with Cross-Modal Attention (SACCMA) for multimodal ERC to effectively leverage multimodal information and speaker information. Our proposed model primarily consists of the modality encoder and the cognitive module. The modality encoder is employed to fuse multimodal feature information from speech, text, and vision using a cross-modal attention mechanism. Subsequently, the fused features and speaker information are separately fed into the cognitive module to enhance the perception of emotions within the dialogue. Compared to seven common baseline methods, our model increased the Accuracy score by 2.71 % and 1.70 % on the IEMOCAP and MELD datasets, respectively. Additionally, the F1 score improved by 2.92 % and 0.70 % for each dataset. Various experiments also demonstrate the effectiveness of our method.
引用
收藏
页数:8
相关论文
共 41 条
  • [31] Combining cross-modal knowledge transfer and semi-supervised learning for speech emotion recognition
    Zhang, Sheng
    Chen, Min
    Chen, Jincai
    Li, Yuan-Fang
    Wu, Yiling
    Li, Minglei
    Zhu, Chuanbo
    KNOWLEDGE-BASED SYSTEMS, 2021, 229
  • [32] Multimodal Knowledge-enhanced Interactive Network with Mixed Contrastive Learning for Emotion Recognition in Conversation
    Shen, Xudong
    Huang, Xianying
    Zou, Shihao
    Gan, Xinyi
    NEUROCOMPUTING, 2024, 582
  • [33] Residual Relation-Aware Attention Deep Graph-Recurrent Model for Emotion Recognition in Conversation
    Duong, Anh-Quang
    Ho, Ngoc-Huynh
    Pant, Sudarshan
    Kim, Seungwon
    Kim, Soo-Hyung
    Yang, Hyung-Jeong
    IEEE ACCESS, 2024, 12 (2349-2360): : 2349 - 2360
  • [34] Multichannel Cross-Modal Fusion Network for Multimodal Sentiment Analysis Considering Language Information Enhancement
    Hu, Ronglong
    Yi, Jizheng
    Chen, Aibin
    Chen, Lijiang
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (07) : 9814 - 9824
  • [35] HAN-ReGRU: hierarchical attention network with residual gated recurrent unit for emotion recognition in conversation
    Hui Ma
    Jian Wang
    Lingfei Qian
    Hongfei Lin
    Neural Computing and Applications, 2021, 33 : 2685 - 2703
  • [36] HAN-ReGRU: hierarchical attention network with residual gated recurrent unit for emotion recognition in conversation
    Ma, Hui
    Wang, Jian
    Qian, Lingfei
    Lin, Hongfei
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (07) : 2685 - 2703
  • [37] GPT-4 enhanced multimodal grounding for autonomous driving: Leveraging cross-modal attention with large language models
    Liao, Haicheng
    Shen, Huanming
    Li, Zhenning
    Wang, Chengyue
    Li, Guofa
    Bie, Yiming
    Xu, Chengzhong
    COMMUNICATIONS IN TRANSPORTATION RESEARCH, 2024, 4
  • [38] Cross-modal Audiovisual Separation Based on U-Net Network Combining Optical Flow Algorithm and Attention Mechanism
    Lan C.
    Jiang P.
    Chen H.
    Han C.
    Guo X.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2023, 45 (10): : 3538 - 3546
  • [39] AGRFNet: Two-stage cross-modal and multi-level attention gated recurrent fusion network for RGB-D saliency detection
    Liu, Zhengyi
    Wang, Yuan
    Tan, Yacheng
    Li, Wei
    Xiao, Yun
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2022, 104
  • [40] Cross-task cognitive workload recognition using a dynamic residual network with attention mechanism based on neurophysiological signals
    Ji, Zhangyifan
    Tang, Jiehao
    Wang, Qi
    Xie, Xin
    Liu, Jiali
    Yin, Zhong
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2023, 230