A Versatile Multimodal Learning Framework for Zero-Shot Emotion Recognition

被引:1
|
作者
Qi, Fan [1 ]
Zhang, Huaiwen [2 ,3 ]
Yang, Xiaoshan [4 ,5 ,6 ]
Xu, Changsheng [4 ,5 ,6 ]
机构
[1] Tianjin Univ Technol, Sch Comp Sci & Engn, Tianjin 300384, Peoples R China
[2] Inner Mongolia Univ, Coll Comp Sci, Hohhot 010021, Peoples R China
[3] Natl & Local Joint Engn Res Ctr Intelligent Infor, Hohhot 010021, Peoples R China
[4] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence, Beijing 100190, Peoples R China
[5] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100190, Peoples R China
[6] Peng Cheng Lab, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal emotion recognition; zero-shot learning; transformer; NETWORKS; MODEL;
D O I
10.1109/TCSVT.2024.3362270
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Multi-modal Emotion Recognition (MER) aims to identify various human emotions from heterogeneous modalities. With the development of emotional theories, there are more and more novel and fine-grained concepts to describe human emotional feelings. Real-world recognition systems often encounter unseen emotion labels. To address this challenge, we propose a versatile zero-shot MER framework to refine emotion label embeddings for capturing inter-label relationships and improving discrimination between labels. We integrate prior knowledge into a novel affective graph space that generates tailored label embeddings capturing inter-label relationships. To obtain multimodal representations, we disentangle the features of each modality into egocentric and altruistic components using adversarial learning. These components are then hierarchically fused using a hybrid co-attention mechanism. Furthermore, an emotion-guided decoder exploits label-modal dependencies to generate adaptive multimodal representations guided by emotion embeddings. We conduct extensive experiments with different multimodal combinations, including visual-acoustic and visual-textual inputs, on four datasets in both single-label and multi-label zero-shot settings. Results demonstrate the superiority of our proposed framework over state-of-the-art methods.
引用
收藏
页码:5728 / 5741
页数:14
相关论文
共 50 条
  • [31] Zero-Shot Learning Based on Deep Weighted Attribute Prediction
    Wang, Xuesong
    Chen, Chen
    Cheng, Yuhu
    Chen, Xun
    Liu, Yu
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2020, 50 (08): : 2948 - 2957
  • [32] Hybrid routing transformer for zero-shot learning
    Cheng, De
    Wang, Gerong
    Wang, Bo
    Zhang, Qiang
    Han, Jungong
    Zhang, Dingwen
    PATTERN RECOGNITION, 2023, 137
  • [33] Zero-Shot Learning for Computer Vision Applications
    Sarma, Sandipan
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9360 - 9364
  • [34] Chart question answering with multimodal graph representation learning and zero-shot classification
    Farahani, Ali Mazraeh
    Adibi, Peyman
    Ehsani, Mohammad Saeed
    Hutter, Hans-Peter
    Darvishy, Alireza
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 270
  • [35] Zero-Shot Learning for Skeleton-based Classroom Action Recognition
    Shi, Bin
    Wang, Luyang
    Yu, Zefang
    Xiang, Suncheng
    Liu, Ting
    Fu, Yuzhuo
    2021 INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND INTELLIGENT CONTROLS (ISCSIC 2021), 2021, : 82 - 86
  • [36] Group-wise interactive region learning for zero-shot recognition
    Guo, Ting
    Liang, Jiye
    Xie, Guo-Sen
    INFORMATION SCIENCES, 2023, 642
  • [37] SR2CNN: Zero-Shot Learning for Signal Recognition
    Dong, Yihong
    Jiang, Xiaohan
    Zhou, Huaji
    Lin, Yun
    Shi, Qingjiang
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 : 2316 - 2329
  • [38] Haptic Zero-Shot Learning: Recognition of objects never touched before
    Abderrahmane, Zineb
    Ganesh, Gowrishankar
    Crosnier, Andre
    Cherubini, Andrea
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2018, 105 : 11 - 25
  • [39] Learning Class Prototypes via Structure Alignment for Zero-Shot Recognition
    Jiang, Huajie
    Wang, Ruiping
    Shan, Shiguang
    Chen, Xilin
    COMPUTER VISION - ECCV 2018, PT X, 2018, 11214 : 121 - 138
  • [40] Zero-Shot Learning on Human-Object Interaction Recognition in video
    Maraghi, Vali Ollah
    Faez, Karim
    2019 5TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS 2019), 2019,