A Versatile Multimodal Learning Framework for Zero-Shot Emotion Recognition

被引:1
|
作者
Qi, Fan [1 ]
Zhang, Huaiwen [2 ,3 ]
Yang, Xiaoshan [4 ,5 ,6 ]
Xu, Changsheng [4 ,5 ,6 ]
机构
[1] Tianjin Univ Technol, Sch Comp Sci & Engn, Tianjin 300384, Peoples R China
[2] Inner Mongolia Univ, Coll Comp Sci, Hohhot 010021, Peoples R China
[3] Natl & Local Joint Engn Res Ctr Intelligent Infor, Hohhot 010021, Peoples R China
[4] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence, Beijing 100190, Peoples R China
[5] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100190, Peoples R China
[6] Peng Cheng Lab, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal emotion recognition; zero-shot learning; transformer; NETWORKS; MODEL;
D O I
10.1109/TCSVT.2024.3362270
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Multi-modal Emotion Recognition (MER) aims to identify various human emotions from heterogeneous modalities. With the development of emotional theories, there are more and more novel and fine-grained concepts to describe human emotional feelings. Real-world recognition systems often encounter unseen emotion labels. To address this challenge, we propose a versatile zero-shot MER framework to refine emotion label embeddings for capturing inter-label relationships and improving discrimination between labels. We integrate prior knowledge into a novel affective graph space that generates tailored label embeddings capturing inter-label relationships. To obtain multimodal representations, we disentangle the features of each modality into egocentric and altruistic components using adversarial learning. These components are then hierarchically fused using a hybrid co-attention mechanism. Furthermore, an emotion-guided decoder exploits label-modal dependencies to generate adaptive multimodal representations guided by emotion embeddings. We conduct extensive experiments with different multimodal combinations, including visual-acoustic and visual-textual inputs, on four datasets in both single-label and multi-label zero-shot settings. Results demonstrate the superiority of our proposed framework over state-of-the-art methods.
引用
收藏
页码:5728 / 5741
页数:14
相关论文
共 50 条
  • [1] Multimodal zero-shot learning for tactile texture recognition ☆
    Cao, Guanqun
    Jiang, Jiaqi
    Bollegala, Danushka
    Li, Min
    Luo, Shan
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2024, 176
  • [2] Autonomous Emotion Learning in Speech: A View of Zero-Shot Speech Emotion Recognition
    Xu, Xinzhou
    Deng, Jun
    Cummins, Nicholas
    Zhang, Zixing
    Zhao, Li
    Schuller, Bjorn W.
    INTERSPEECH 2019, 2019, : 949 - 953
  • [3] A review on multimodal zero-shot learning
    Cao, Weipeng
    Wu, Yuhao
    Sun, Yixuan
    Zhang, Haigang
    Ren, Jin
    Gu, Dujuan
    Wang, Xingkai
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2023, 13 (02)
  • [4] Zero-Shot Visual Emotion Recognition by Exploiting BERT
    Kang, Hyunwook
    Hazarika, Devamanyu
    Kim, Dongho
    Kim, Jihie
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 2, 2023, 543 : 485 - 494
  • [5] An Adversarial Learning Framework for Zero-shot Fault Recognition of Mechanical Systems
    Chen, Jinglong
    Pan, Tongyang
    Zhou, Zitong
    He, Shuilong
    2019 IEEE 17TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2019, : 1275 - 1278
  • [6] ZeroEVNet: A multimodal zero-shot learning framework for scalable emergency vehicle detection
    Ravi, Reeta
    Kanniappan, Jayashree
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 275
  • [7] Zero-shot Video Emotion Recognition via Multimodal Protagonist-aware Transformer Network
    Qi, Fan
    Yang, Xiaoshan
    Xu, Changsheng
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1074 - 1083
  • [8] Integrative zero-shot learning for fruit recognition
    Tran-Anh, Dat
    Huu, Quynh Nguyen
    Bui-Quoc, Bao
    Hoang, Ngan Dao
    Quoc, Tao Ngo
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (29) : 73191 - 73213
  • [9] Kernelized distance learning for zero-shot recognition
    Zarei, Mohammad Reza
    Taheri, Mohammad
    Long, Yang
    INFORMATION SCIENCES, 2021, 580 : 801 - 818
  • [10] An Attribute Learning Method for Zero-Shot Recognition
    Yazdanian, Ramtin
    Shojaee, Seyed Mohsen
    Baghshah, Mahdieh Soleymani
    2017 25TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2017, : 2235 - 2240