Speaker-aware cognitive network with cross-modal attention for multimodal emotion recognition in conversation

被引：0

作者：

Guo, Lili ^{[1
,2
]}

Song, Yikang ^{[1
]}

Ding, Shifei ^{[1
,2
]}

机构：

[1] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou 221116, Peoples R China

[2] Minist Educ Peoples Republ China, Mine Digitizat Engn Res Ctr, Xuzhou 221116, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2024年 / 296卷

关键词：

Speaker-aware; Emotion recognition in conversation; Cross-modal attention; MODEL;

D O I：

10.1016/j.knosys.2024.111969

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Emotion recognition in conversation (ERC) has gained considerable attention owing to its extensive applications in the field of human-computer interaction. However, previous models have had certain limitations in exploring the potential emotional relationships within the conversation due to their inability to fully leverage speaker information. Additionally, information from various modalities such as text, audio, and video can synergistically enhance and supplement the analysis of emotional context within the conversation. Nonetheless, effectively fusing multimodal features to understand the detailed contextual information in the conversation is challenging. This paper proposes a Speaker-Aware Cognitive network with Cross-Modal Attention (SACCMA) for multimodal ERC to effectively leverage multimodal information and speaker information. Our proposed model primarily consists of the modality encoder and the cognitive module. The modality encoder is employed to fuse multimodal feature information from speech, text, and vision using a cross-modal attention mechanism. Subsequently, the fused features and speaker information are separately fed into the cognitive module to enhance the perception of emotions within the dialogue. Compared to seven common baseline methods, our model increased the Accuracy score by 2.71 % and 1.70 % on the IEMOCAP and MELD datasets, respectively. Additionally, the F1 score improved by 2.92 % and 0.70 % for each dataset. Various experiments also demonstrate the effectiveness of our method.

引用

页数：8

共 50 条

[1] Speaker-Aware Interactive Graph Attention Network for Emotion Recognition in Conversation
Jia, Zhaohong
Shi, Yunwei
Liu, Weifeng
Huang, Zhenhua
Sun, Xiao
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (12)
[2] Speaker-aware Cross-modal Fusion Architecture for Conversational Emotion Recognition
Zhao, Huan
Li, Bo
Zhang, Zixing
INTERSPEECH 2023, 2023, : 2718 - 2722
[3] SAPBERT: Speaker-Aware Pretrained BERT for Emotion Recognition in Conversation
Lim, Seunguook
Kim, Jihie
ALGORITHMS, 2023, 16 (01)
[4] UCEMA: Uni-modal and cross-modal encoding network based on multi-head attention for emotion recognition in conversation
Zhao, Hongkun
Liu, Siyuan
Chen, Yang
Kong, Fanmin
Zeng, Qingtian
Li, Kang
MULTIMEDIA SYSTEMS, 2024, 30 (06)
[5] A multimodal shared network with a cross-modal distribution constraint for continuous emotion recognition
Li, Chiqin
Xie, Lun
Shao, Xingmao
Pan, Hang
Wang, Zhiliang
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
[6] Interactive Multimodal Attention Network for Emotion Recognition in Conversation
Ren, Minjie
Huang, Xiangdong
Shi, Xiaoqi
Nie, Weizhi
IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 1046 - 1050
[7] A Contextual Attention Network for Multimodal Emotion Recognition in Conversation
Wang, Tana
Hou, Yaqing
Zhou, Dongsheng
Zhang, Qiang
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[8] Cross-Modal Dynamic Transfer Learning for Multimodal Emotion Recognition
Hong, Soyeon
Kang, Hyeoungguk
Cho, Hyunsouk
IEEE ACCESS, 2024, 12 : 14324 - 14333
[9] A cross-modal fusion network based on graph feature learning for multimodal emotion recognition
Cao Xiaopeng
Zhang Linying
Chen Qiuxian
Ning Hailong
Dong Yizhuo
The Journal of China Universities of Posts and Telecommunications, 2024, 31 (06) : 16 - 25
[10] Speaker-Aware Speech Emotion Recognition by Fusing Amplitude and Phase Information
Guo, Lili
Wang, Longbiao
Dang, Jianwu
Liu, Zhilei
Guan, Haotian
MULTIMEDIA MODELING (MMM 2020), PT I, 2020, 11961 : 14 - 25

← 1 2 3 4 5 →