Speaker-aware cognitive network with cross-modal attention for multimodal emotion recognition in conversation

被引:0
|
作者
Guo, Lili [1 ,2 ]
Song, Yikang [1 ]
Ding, Shifei [1 ,2 ]
机构
[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou 221116, Peoples R China
[2] Minist Educ Peoples Republ China, Mine Digitizat Engn Res Ctr, Xuzhou 221116, Peoples R China
关键词
Speaker-aware; Emotion recognition in conversation; Cross-modal attention; MODEL;
D O I
10.1016/j.knosys.2024.111969
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotion recognition in conversation (ERC) has gained considerable attention owing to its extensive applications in the field of human-computer interaction. However, previous models have had certain limitations in exploring the potential emotional relationships within the conversation due to their inability to fully leverage speaker information. Additionally, information from various modalities such as text, audio, and video can synergistically enhance and supplement the analysis of emotional context within the conversation. Nonetheless, effectively fusing multimodal features to understand the detailed contextual information in the conversation is challenging. This paper proposes a Speaker-Aware Cognitive network with Cross-Modal Attention (SACCMA) for multimodal ERC to effectively leverage multimodal information and speaker information. Our proposed model primarily consists of the modality encoder and the cognitive module. The modality encoder is employed to fuse multimodal feature information from speech, text, and vision using a cross-modal attention mechanism. Subsequently, the fused features and speaker information are separately fed into the cognitive module to enhance the perception of emotions within the dialogue. Compared to seven common baseline methods, our model increased the Accuracy score by 2.71 % and 1.70 % on the IEMOCAP and MELD datasets, respectively. Additionally, the F1 score improved by 2.92 % and 0.70 % for each dataset. Various experiments also demonstrate the effectiveness of our method.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Speaker-Aware Interactive Graph Attention Network for Emotion Recognition in Conversation
    Jia, Zhaohong
    Shi, Yunwei
    Liu, Weifeng
    Huang, Zhenhua
    Sun, Xiao
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (12)
  • [2] Speaker-aware Cross-modal Fusion Architecture for Conversational Emotion Recognition
    Zhao, Huan
    Li, Bo
    Zhang, Zixing
    INTERSPEECH 2023, 2023, : 2718 - 2722
  • [3] SAPBERT: Speaker-Aware Pretrained BERT for Emotion Recognition in Conversation
    Lim, Seunguook
    Kim, Jihie
    ALGORITHMS, 2023, 16 (01)
  • [4] UCEMA: Uni-modal and cross-modal encoding network based on multi-head attention for emotion recognition in conversation
    Zhao, Hongkun
    Liu, Siyuan
    Chen, Yang
    Kong, Fanmin
    Zeng, Qingtian
    Li, Kang
    MULTIMEDIA SYSTEMS, 2024, 30 (06)
  • [5] A multimodal shared network with a cross-modal distribution constraint for continuous emotion recognition
    Li, Chiqin
    Xie, Lun
    Shao, Xingmao
    Pan, Hang
    Wang, Zhiliang
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [6] Interactive Multimodal Attention Network for Emotion Recognition in Conversation
    Ren, Minjie
    Huang, Xiangdong
    Shi, Xiaoqi
    Nie, Weizhi
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1046 - 1050
  • [7] A Contextual Attention Network for Multimodal Emotion Recognition in Conversation
    Wang, Tana
    Hou, Yaqing
    Zhou, Dongsheng
    Zhang, Qiang
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [8] Cross-Modal Dynamic Transfer Learning for Multimodal Emotion Recognition
    Hong, Soyeon
    Kang, Hyeoungguk
    Cho, Hyunsouk
    IEEE ACCESS, 2024, 12 : 14324 - 14333
  • [9] A cross-modal fusion network based on graph feature learning for multimodal emotion recognition
    Cao Xiaopeng
    Zhang Linying
    Chen Qiuxian
    Ning Hailong
    Dong Yizhuo
    The Journal of China Universities of Posts and Telecommunications, 2024, 31 (06) : 16 - 25
  • [10] Speaker-Aware Speech Emotion Recognition by Fusing Amplitude and Phase Information
    Guo, Lili
    Wang, Longbiao
    Dang, Jianwu
    Liu, Zhilei
    Guan, Haotian
    MULTIMEDIA MODELING (MMM 2020), PT I, 2020, 11961 : 14 - 25