A Versatile Multimodal Learning Framework for Zero-Shot Emotion Recognition

被引:1
|
作者
Qi, Fan [1 ]
Zhang, Huaiwen [2 ,3 ]
Yang, Xiaoshan [4 ,5 ,6 ]
Xu, Changsheng [4 ,5 ,6 ]
机构
[1] Tianjin Univ Technol, Sch Comp Sci & Engn, Tianjin 300384, Peoples R China
[2] Inner Mongolia Univ, Coll Comp Sci, Hohhot 010021, Peoples R China
[3] Natl & Local Joint Engn Res Ctr Intelligent Infor, Hohhot 010021, Peoples R China
[4] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence, Beijing 100190, Peoples R China
[5] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100190, Peoples R China
[6] Peng Cheng Lab, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal emotion recognition; zero-shot learning; transformer; NETWORKS; MODEL;
D O I
10.1109/TCSVT.2024.3362270
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Multi-modal Emotion Recognition (MER) aims to identify various human emotions from heterogeneous modalities. With the development of emotional theories, there are more and more novel and fine-grained concepts to describe human emotional feelings. Real-world recognition systems often encounter unseen emotion labels. To address this challenge, we propose a versatile zero-shot MER framework to refine emotion label embeddings for capturing inter-label relationships and improving discrimination between labels. We integrate prior knowledge into a novel affective graph space that generates tailored label embeddings capturing inter-label relationships. To obtain multimodal representations, we disentangle the features of each modality into egocentric and altruistic components using adversarial learning. These components are then hierarchically fused using a hybrid co-attention mechanism. Furthermore, an emotion-guided decoder exploits label-modal dependencies to generate adaptive multimodal representations guided by emotion embeddings. We conduct extensive experiments with different multimodal combinations, including visual-acoustic and visual-textual inputs, on four datasets in both single-label and multi-label zero-shot settings. Results demonstrate the superiority of our proposed framework over state-of-the-art methods.
引用
收藏
页码:5728 / 5741
页数:14
相关论文
共 50 条
  • [21] Extreme Reverse Projection Learning for Zero-Shot Recognition
    Guan, Jiechao
    Zhao, An
    Lu, Zhiwu
    COMPUTER VISION - ACCV 2018, PT I, 2019, 11361 : 125 - 141
  • [22] Label-activating framework for zero-shot learning
    Liu, Yang
    Gao, Xinbo
    Gao, Quanxue
    Han, Jungong
    Shao, Ling
    NEURAL NETWORKS, 2020, 121 : 1 - 9
  • [23] Research progress of zero-shot learning
    Sun, Xiaohong
    Gu, Jinan
    Sun, Hongying
    APPLIED INTELLIGENCE, 2021, 51 (06) : 3600 - 3614
  • [24] Learning Using Privileged Information for Zero-Shot Action Recognition
    Gao, Zhiyi
    Hou, Yonghong
    Li, Wanqing
    Guo, Zihui
    Yu, Bin
    COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 347 - 362
  • [25] Grouping attributes zero-shot learning for tongue constitution recognition
    Wen, Guihua
    Ma, Jiajiong
    Hu, Yang
    Li, Huihui
    Jiang, Lijun
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2020, 109
  • [26] Learning discriminative visual semantic embedding for zero-shot recognition
    Xie, Yurui
    Song, Tiecheng
    Yuan, Jianying
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 115
  • [27] A Biologically Inspired Feature Enhancement Framework for Zero-Shot Learning
    Xie, Zhongwu
    Cao, Weipeng
    Wang, Xizhao
    Ming, Zhong
    Zhang, Jingjing
    Zhang, Jiyong
    2020 7TH IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND CLOUD COMPUTING (CSCLOUD 2020)/2020 6TH IEEE INTERNATIONAL CONFERENCE ON EDGE COMPUTING AND SCALABLE CLOUD (EDGECOM 2020), 2020, : 120 - 125
  • [28] Rebalanced Zero-Shot Learning
    Ye, Zihan
    Yang, Guanyu
    Jin, Xiaobo
    Liu, Youfa
    Huang, Kaizhu
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 4185 - 4198
  • [29] Spherical Zero-Shot Learning
    Shen, Jiayi
    Xiao, Zehao
    Zhen, Xiantong
    Zhang, Lei
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (02) : 634 - 645
  • [30] A Unified Approach for Conventional Zero-Shot, Generalized Zero-Shot, and Few-Shot Learning
    Rahman, Shafin
    Khan, Salman
    Porikli, Fatih
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (11) : 5652 - 5667