Speaker-aware cognitive network with cross-modal attention for multimodal emotion recognition in conversation

被引:0
作者
Guo, Lili [1 ,2 ]
Song, Yikang [1 ]
Ding, Shifei [1 ,2 ]
机构
[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou 221116, Peoples R China
[2] Minist Educ Peoples Republ China, Mine Digitizat Engn Res Ctr, Xuzhou 221116, Peoples R China
关键词
Speaker-aware; Emotion recognition in conversation; Cross-modal attention; MODEL;
D O I
10.1016/j.knosys.2024.111969
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotion recognition in conversation (ERC) has gained considerable attention owing to its extensive applications in the field of human-computer interaction. However, previous models have had certain limitations in exploring the potential emotional relationships within the conversation due to their inability to fully leverage speaker information. Additionally, information from various modalities such as text, audio, and video can synergistically enhance and supplement the analysis of emotional context within the conversation. Nonetheless, effectively fusing multimodal features to understand the detailed contextual information in the conversation is challenging. This paper proposes a Speaker-Aware Cognitive network with Cross-Modal Attention (SACCMA) for multimodal ERC to effectively leverage multimodal information and speaker information. Our proposed model primarily consists of the modality encoder and the cognitive module. The modality encoder is employed to fuse multimodal feature information from speech, text, and vision using a cross-modal attention mechanism. Subsequently, the fused features and speaker information are separately fed into the cognitive module to enhance the perception of emotions within the dialogue. Compared to seven common baseline methods, our model increased the Accuracy score by 2.71 % and 1.70 % on the IEMOCAP and MELD datasets, respectively. Additionally, the F1 score improved by 2.92 % and 0.70 % for each dataset. Various experiments also demonstrate the effectiveness of our method.
引用
收藏
页数:8
相关论文
共 41 条
  • [41] Multi-view imputation and cross-attention network based on incomplete longitudinal and multimodal data for conversion prediction of mild cognitive impairment
    Wang, Tao
    Chen, Xiumei
    Zhang, Xiaoling
    Zhou, Shuoling
    Feng, Qianjin
    Huang, Meiyan
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 231