Modality emotion semantic correlation analysis for multimodal emotion recognition

被引：0

作者：

Zhang, Yuqing ^{[1
]}

Xie, Dongliang ^{[1
]}

Luo, Dawei ^{[1
]}

Sun, Baosheng ^{[2
]}

机构：

[1] Beijing Univ Posts & Telecommun, State Key Lab Networking & Switching Technol, Beijing 100876, Peoples R China

[2] Beijing Jinghang Res Inst Comp & Commun, Beijing, Peoples R China

来源：

COMPUTERS & ELECTRICAL ENGINEERING | 2025年 / 126卷

关键词：

Emotion recognition; Multimodal fusion; Feature interaction; Canonical correlation analysis; CANONICAL CORRELATION-ANALYSIS; FUSION;

D O I：

10.1016/j.compeleceng.2025.110467

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Affective computing serves as the fundamental technology and a crucial prerequisite for attaining naturalized and anthropomorphic human-computer interaction. Nevertheless, the expression of emotion is complex and multi-dimensional, posing significant challenges for multimodal emotion recognition due to the heterogeneity gap among distinct modalities. To tackle this issue, we propose a novel approach named modality emotion semantic correlation analysis (MESCA), which enhances multimodal affective semantic consistency by leveraging modality correlation learning to achieve multimodal information complementation. Specifically, we first design a modal-pair correlation module that calculates emotion semantic consistency across text, audio and video information. This module contributes to a comprehensive understanding of the emotional state by fusing complementary semantic information and assists in mitigating redundancy in pairwise interaction methods. Next, we introduce structural re-parameterization technology that transforms the multi-branch training structure into a single-branch inference structure to solve the problem of excessive computational expense, thereby facilitating a more efficient and effective recognition process. Additionally, the proposed model is verified on two public datasets, IEMOCAP and CMU-MOSEI. Compared to baseline methods, MESCA significantly enhances efficiency while maintaining prediction accuracy on IEMOCAP, and outperforms on both efficiency and accuracy on CMU-MOSEI.

引用

页数：13

共 60 条

[1] Mel Frequency Cepstral Coefficient and its Applications: A Review [J].

Abdul, Zrar Kh. ;

Al-Talabani, Abdulbasit K. K. .

IEEE ACCESS, 2022, 10 :122136-122158

[2] A systematic survey on multimodal emotion recognition using learning algorithms [J].

Ahmed, Naveed ;

Al Aghbari, Zaher ;

Girija, Shini .

INTELLIGENT SYSTEMS WITH APPLICATIONS, 2023, 17

[3]

Andrew G., 2013, INT C MACH LEARN, P1247

[4] A hybrid BERT-CPSO model for multi-class depression detection using pure hindi and hinglish multimodal data on social media [J].

Beniwal, Rohit ;

Saraswat, Pavi .

COMPUTERS & ELECTRICAL ENGINEERING, 2024, 120

[5] IEMOCAP: interactive emotional dyadic motion capture database [J].

Busso, Carlos ;

Bulut, Murtaza ;

Lee, Chi-Chun ;

Kazemzadeh, Abe ;

Mower, Emily ;

Kim, Samuel ;

Chang, Jeannette N. ;

Lee, Sungbok ;

Narayanan, Shrikanth S. .

LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359

[6] K-Means Clustering-Based Kernel Canonical Correlation Analysis for Multimodal Emotion Recognition in Human-Robot Interaction [J].

Chen, Luefeng ;

Wang, Kuanlin ;

Li, Min ;

Wu, Min ;

Pedrycz, Witold ;

Hirota, Kaoru .

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2023, 70 (01) :1016-1024

[7] Coupled Multimodal Emotional Feature Analysis Based on Broad-Deep Fusion Networks in Human-Robot Interaction [J].

Chen, Luefeng ;

Li, Min ;

Wu, Min ;

Pedrycz, Witold ;

Hirota, Kaoru .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (07) :9663-9673

[8] Understanding and Mitigating Annotation Bias in Facial Expression Recognition [J].

Chen, Yunliang ;

Joo, Jungseock .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :14960-14971

[9]

Dai WL, 2021, 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), P5305

[10] RepVGG: Making VGG-style ConvNets Great Again [J].

Ding, Xiaohan ;

Zhang, Xiangyu ;

Ma, Ningning ;

Han, Jungong ;

Ding, Guiguang ;

Sun, Jian .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :13728-13737

← 1 2 3 4 5 6 →