Modality emotion semantic correlation analysis for multimodal emotion recognition

被引:0
作者
Zhang, Yuqing [1 ]
Xie, Dongliang [1 ]
Luo, Dawei [1 ]
Sun, Baosheng [2 ]
机构
[1] Beijing Univ Posts & Telecommun, State Key Lab Networking & Switching Technol, Beijing 100876, Peoples R China
[2] Beijing Jinghang Res Inst Comp & Commun, Beijing, Peoples R China
关键词
Emotion recognition; Multimodal fusion; Feature interaction; Canonical correlation analysis; CANONICAL CORRELATION-ANALYSIS; FUSION;
D O I
10.1016/j.compeleceng.2025.110467
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Affective computing serves as the fundamental technology and a crucial prerequisite for attaining naturalized and anthropomorphic human-computer interaction. Nevertheless, the expression of emotion is complex and multi-dimensional, posing significant challenges for multimodal emotion recognition due to the heterogeneity gap among distinct modalities. To tackle this issue, we propose a novel approach named modality emotion semantic correlation analysis (MESCA), which enhances multimodal affective semantic consistency by leveraging modality correlation learning to achieve multimodal information complementation. Specifically, we first design a modal-pair correlation module that calculates emotion semantic consistency across text, audio and video information. This module contributes to a comprehensive understanding of the emotional state by fusing complementary semantic information and assists in mitigating redundancy in pairwise interaction methods. Next, we introduce structural re-parameterization technology that transforms the multi-branch training structure into a single-branch inference structure to solve the problem of excessive computational expense, thereby facilitating a more efficient and effective recognition process. Additionally, the proposed model is verified on two public datasets, IEMOCAP and CMU-MOSEI. Compared to baseline methods, MESCA significantly enhances efficiency while maintaining prediction accuracy on IEMOCAP, and outperforms on both efficiency and accuracy on CMU-MOSEI.
引用
收藏
页数:13
相关论文
共 60 条
[1]   Mel Frequency Cepstral Coefficient and its Applications: A Review [J].
Abdul, Zrar Kh. ;
Al-Talabani, Abdulbasit K. K. .
IEEE ACCESS, 2022, 10 :122136-122158
[2]   A systematic survey on multimodal emotion recognition using learning algorithms [J].
Ahmed, Naveed ;
Al Aghbari, Zaher ;
Girija, Shini .
INTELLIGENT SYSTEMS WITH APPLICATIONS, 2023, 17
[3]  
Andrew G., 2013, INT C MACH LEARN, P1247
[4]   A hybrid BERT-CPSO model for multi-class depression detection using pure hindi and hinglish multimodal data on social media [J].
Beniwal, Rohit ;
Saraswat, Pavi .
COMPUTERS & ELECTRICAL ENGINEERING, 2024, 120
[5]   IEMOCAP: interactive emotional dyadic motion capture database [J].
Busso, Carlos ;
Bulut, Murtaza ;
Lee, Chi-Chun ;
Kazemzadeh, Abe ;
Mower, Emily ;
Kim, Samuel ;
Chang, Jeannette N. ;
Lee, Sungbok ;
Narayanan, Shrikanth S. .
LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359
[6]   K-Means Clustering-Based Kernel Canonical Correlation Analysis for Multimodal Emotion Recognition in Human-Robot Interaction [J].
Chen, Luefeng ;
Wang, Kuanlin ;
Li, Min ;
Wu, Min ;
Pedrycz, Witold ;
Hirota, Kaoru .
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2023, 70 (01) :1016-1024
[7]   Coupled Multimodal Emotional Feature Analysis Based on Broad-Deep Fusion Networks in Human-Robot Interaction [J].
Chen, Luefeng ;
Li, Min ;
Wu, Min ;
Pedrycz, Witold ;
Hirota, Kaoru .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (07) :9663-9673
[8]   Understanding and Mitigating Annotation Bias in Facial Expression Recognition [J].
Chen, Yunliang ;
Joo, Jungseock .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :14960-14971
[9]  
Dai WL, 2021, 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), P5305
[10]   RepVGG: Making VGG-style ConvNets Great Again [J].
Ding, Xiaohan ;
Zhang, Xiangyu ;
Ma, Ningning ;
Han, Jungong ;
Ding, Guiguang ;
Sun, Jian .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :13728-13737