Semantic Alignment Network for Multi-Modal Emotion Recognition

被引:10
|
作者
Hou, Mixiao [1 ]
Zhang, Zheng [1 ,2 ]
Liu, Chang [3 ]
Lu, Guangming [1 ,4 ]
机构
[1] Harbin Inst Technol Shenzhen, Shenzhen 518055, Peoples R China
[2] Peng Cheng Lab, Shenzhen 518055, Peoples R China
[3] Ind & Commercial Bank China, Software Dev Ctr, Beijing 519000, Peoples R China
[4] Guangdong Prov Key Lab Novel Secur Intelligence Te, Shenzhen 518055, Peoples R China
关键词
Semantic alignment; multi-spatial learning; self-modal interaction; emotion recognition; MODEL;
D O I
10.1109/TCSVT.2023.3247822
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Modality alignment can maintain the consistency of semantics in multi-modal emotion recognition tasks, ensuring that features from different modalities accurately represent the emotion-related information in an encoding space. However, current alignment models either focus only on the local fusion of different modal representations or lack a mining process for unimodal specificity information. We design a Semantic Alignment network based on Multi-Spatial learning (SAMS) for multi-modal emotion recognition, which achieves local and global alignment between modalities using high-level emotion representations of different modalities as supervisory signals. SAMS builds a multi-spatial learning framework for each modality, and constructs a self-modal interaction module under this framework based on cross-modal semantic learning. SAMS provides two learning spaces for each modality, one to detect the affective information for a specific modality, and the other to learn semantic knowledge from other modalities. Subsequently, the features of these two spaces are aligned in temporal and utterance levels by homologous encoding and different target constraints. Based on the alignment characteristics of these two spaces, a self-modal interaction is built to investigate the fusion representation by exploring the global correlation between the alignment features in unimodal multi-spatial learning. In experiments, our proposed model yields consistent improvements on two standard multi-modal benchmarks, and outperforms state-of-the-art approaches. The code of our SAMS is available at: https://github.com/xiaomi1024/code_SAMS.
引用
收藏
页码:5318 / 5329
页数:12
相关论文
共 50 条
  • [1] Semantic Enhancement Network Integrating Label Knowledge for Multi-modal Emotion Recognition
    Zheng, HongFeng
    Miao, ShengFa
    Yu, Qian
    Mu, YongKang
    Jin, Xin
    Yan, KeShan
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT V, ICIC 2024, 2024, 14879 : 473 - 484
  • [2] Multi-modal Correlated Network for emotion recognition in speech
    Ren, Minjie
    Nie, Weizhi
    Liu, Anan
    Su, Yuting
    VISUAL INFORMATICS, 2019, 3 (03) : 150 - 155
  • [3] Multi-modal fusion network with complementarity and importance for emotion recognition
    Liu, Shuai
    Gao, Peng
    Li, Yating
    Fu, Weina
    Ding, Weiping
    INFORMATION SCIENCES, 2023, 619 : 679 - 694
  • [4] Dense Attention Memory Network for Multi-modal emotion recognition
    Ma, Gailing
    Guo, Xiao
    2022 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING, MLNLP 2022, 2022, : 48 - 53
  • [5] A novel signal channel attention network for multi-modal emotion recognition
    Du, Ziang
    Ye, Xia
    Zhao, Pujie
    FRONTIERS IN NEUROROBOTICS, 2024, 18
  • [6] Multi-modal Attention for Speech Emotion Recognition
    Pan, Zexu
    Luo, Zhaojie
    Yang, Jichen
    Li, Haizhou
    INTERSPEECH 2020, 2020, : 364 - 368
  • [7] Towards Efficient Multi-Modal Emotion Recognition
    Dobrisek, Simon
    Gajsek, Rok
    Mihelic, France
    Pavesic, Nikola
    Struc, Vitomir
    INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2013, 10
  • [8] Multi-modal Emotion Recognition Based on Hypergraph
    Zong L.-L.
    Zhou J.-H.
    Xie Q.-J.
    Zhang X.-C.
    Xu B.
    Jisuanji Xuebao/Chinese Journal of Computers, 2023, 46 (12): : 2520 - 2534
  • [9] Evaluation and Discussion of Multi-modal Emotion Recognition
    Rabie, Ahmad
    Wrede, Britta
    Vogt, Thurid
    Hanheide, Marc
    SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND ELECTRICAL ENGINEERING, VOL 1, PROCEEDINGS, 2009, : 598 - +
  • [10] Emotion Recognition from Multi-Modal Information
    Wu, Chung-Hsien
    Lin, Jen-Chun
    Wei, Wen-Li
    Cheng, Kuan-Chun
    2013 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2013,