Semantic Alignment Network for Multi-Modal Emotion Recognition

被引：10

作者：

Hou, Mixiao ^{[1
]}

Zhang, Zheng ^{[1
,2
]}

Liu, Chang ^{[3
]}

Lu, Guangming ^{[1
,4
]}

机构：

[1] Harbin Inst Technol Shenzhen, Shenzhen 518055, Peoples R China

[2] Peng Cheng Lab, Shenzhen 518055, Peoples R China

[3] Ind & Commercial Bank China, Software Dev Ctr, Beijing 519000, Peoples R China

[4] Guangdong Prov Key Lab Novel Secur Intelligence Te, Shenzhen 518055, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2023年 / 33卷 / 09期

关键词：

Semantic alignment; multi-spatial learning; self-modal interaction; emotion recognition; MODEL;

D O I：

10.1109/TCSVT.2023.3247822

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Modality alignment can maintain the consistency of semantics in multi-modal emotion recognition tasks, ensuring that features from different modalities accurately represent the emotion-related information in an encoding space. However, current alignment models either focus only on the local fusion of different modal representations or lack a mining process for unimodal specificity information. We design a Semantic Alignment network based on Multi-Spatial learning (SAMS) for multi-modal emotion recognition, which achieves local and global alignment between modalities using high-level emotion representations of different modalities as supervisory signals. SAMS builds a multi-spatial learning framework for each modality, and constructs a self-modal interaction module under this framework based on cross-modal semantic learning. SAMS provides two learning spaces for each modality, one to detect the affective information for a specific modality, and the other to learn semantic knowledge from other modalities. Subsequently, the features of these two spaces are aligned in temporal and utterance levels by homologous encoding and different target constraints. Based on the alignment characteristics of these two spaces, a self-modal interaction is built to investigate the fusion representation by exploring the global correlation between the alignment features in unimodal multi-spatial learning. In experiments, our proposed model yields consistent improvements on two standard multi-modal benchmarks, and outperforms state-of-the-art approaches. The code of our SAMS is available at: https://github.com/xiaomi1024/code_SAMS.

引用

页码：5318 / 5329

页数：12

共 50 条

[1] Semantic Enhancement Network Integrating Label Knowledge for Multi-modal Emotion Recognition
Zheng, HongFeng
Miao, ShengFa
Yu, Qian
Mu, YongKang
Jin, Xin
Yan, KeShan
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT V, ICIC 2024, 2024, 14879 : 473 - 484
[2] Multi-modal Correlated Network for emotion recognition in speech
Ren, Minjie
Nie, Weizhi
Liu, Anan
Su, Yuting
VISUAL INFORMATICS, 2019, 3 (03) : 150 - 155
[3] Multi-modal fusion network with complementarity and importance for emotion recognition
Liu, Shuai
Gao, Peng
Li, Yating
Fu, Weina
Ding, Weiping
INFORMATION SCIENCES, 2023, 619 : 679 - 694
[4] Dense Attention Memory Network for Multi-modal emotion recognition
Ma, Gailing
Guo, Xiao
2022 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING, MLNLP 2022, 2022, : 48 - 53
[5] A novel signal channel attention network for multi-modal emotion recognition
Du, Ziang
Ye, Xia
Zhao, Pujie
FRONTIERS IN NEUROROBOTICS, 2024, 18
[6] Multi-modal Attention for Speech Emotion Recognition
Pan, Zexu
Luo, Zhaojie
Yang, Jichen
Li, Haizhou
INTERSPEECH 2020, 2020, : 364 - 368
[7] Towards Efficient Multi-Modal Emotion Recognition
Dobrisek, Simon
Gajsek, Rok
Mihelic, France
Pavesic, Nikola
Struc, Vitomir
INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2013, 10
[8] Multi-modal Emotion Recognition Based on Hypergraph
Zong L.-L.
Zhou J.-H.
Xie Q.-J.
Zhang X.-C.
Xu B.
Jisuanji Xuebao/Chinese Journal of Computers, 2023, 46 (12): : 2520 - 2534
[9] Evaluation and Discussion of Multi-modal Emotion Recognition
Rabie, Ahmad
Wrede, Britta
Vogt, Thurid
Hanheide, Marc
SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND ELECTRICAL ENGINEERING, VOL 1, PROCEEDINGS, 2009, : 598 - +
[10] Emotion Recognition from Multi-Modal Information
Wu, Chung-Hsien
Lin, Jen-Chun
Wei, Wen-Li
Cheng, Kuan-Chun
2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,

← 1 2 3 4 5 →