A Versatile Multimodal Learning Framework for Zero-Shot Emotion Recognition

被引：1

作者：

Qi, Fan ^{[1
]}

Zhang, Huaiwen ^{[2
,3
]}

Yang, Xiaoshan ^{[4
,5
,6
]}

Xu, Changsheng ^{[4
,5
,6
]}

机构：

[1] Tianjin Univ Technol, Sch Comp Sci & Engn, Tianjin 300384, Peoples R China

[2] Inner Mongolia Univ, Coll Comp Sci, Hohhot 010021, Peoples R China

[3] Natl & Local Joint Engn Res Ctr Intelligent Infor, Hohhot 010021, Peoples R China

[4] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence, Beijing 100190, Peoples R China

[5] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100190, Peoples R China

[6] Peng Cheng Lab, Shenzhen 518055, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Multimodal emotion recognition; zero-shot learning; transformer; NETWORKS; MODEL;

D O I：

10.1109/TCSVT.2024.3362270

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Multi-modal Emotion Recognition (MER) aims to identify various human emotions from heterogeneous modalities. With the development of emotional theories, there are more and more novel and fine-grained concepts to describe human emotional feelings. Real-world recognition systems often encounter unseen emotion labels. To address this challenge, we propose a versatile zero-shot MER framework to refine emotion label embeddings for capturing inter-label relationships and improving discrimination between labels. We integrate prior knowledge into a novel affective graph space that generates tailored label embeddings capturing inter-label relationships. To obtain multimodal representations, we disentangle the features of each modality into egocentric and altruistic components using adversarial learning. These components are then hierarchically fused using a hybrid co-attention mechanism. Furthermore, an emotion-guided decoder exploits label-modal dependencies to generate adaptive multimodal representations guided by emotion embeddings. We conduct extensive experiments with different multimodal combinations, including visual-acoustic and visual-textual inputs, on four datasets in both single-label and multi-label zero-shot settings. Results demonstrate the superiority of our proposed framework over state-of-the-art methods.

引用

页码：5728 / 5741

页数：14

共 50 条

[21] Extreme Reverse Projection Learning for Zero-Shot Recognition
Guan, Jiechao
Zhao, An
Lu, Zhiwu
COMPUTER VISION - ACCV 2018, PT I, 2019, 11361 : 125 - 141
[22] Label-activating framework for zero-shot learning
Liu, Yang
Gao, Xinbo
Gao, Quanxue
Han, Jungong
Shao, Ling
NEURAL NETWORKS, 2020, 121 : 1 - 9
[23] Research progress of zero-shot learning
Sun, Xiaohong
Gu, Jinan
Sun, Hongying
APPLIED INTELLIGENCE, 2021, 51 (06) : 3600 - 3614
[24] Learning Using Privileged Information for Zero-Shot Action Recognition
Gao, Zhiyi
Hou, Yonghong
Li, Wanqing
Guo, Zihui
Yu, Bin
COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 347 - 362
[25] Grouping attributes zero-shot learning for tongue constitution recognition
Wen, Guihua
Ma, Jiajiong
Hu, Yang
Li, Huihui
Jiang, Lijun
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2020, 109
[26] Learning discriminative visual semantic embedding for zero-shot recognition
Xie, Yurui
Song, Tiecheng
Yuan, Jianying
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 115
[27] A Biologically Inspired Feature Enhancement Framework for Zero-Shot Learning
Xie, Zhongwu
Cao, Weipeng
Wang, Xizhao
Ming, Zhong
Zhang, Jingjing
Zhang, Jiyong
2020 7TH IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND CLOUD COMPUTING (CSCLOUD 2020)/2020 6TH IEEE INTERNATIONAL CONFERENCE ON EDGE COMPUTING AND SCALABLE CLOUD (EDGECOM 2020), 2020, : 120 - 125
[28] Rebalanced Zero-Shot Learning
Ye, Zihan
Yang, Guanyu
Jin, Xiaobo
Liu, Youfa
Huang, Kaizhu
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 4185 - 4198
[29] Spherical Zero-Shot Learning
Shen, Jiayi
Xiao, Zehao
Zhen, Xiantong
Zhang, Lei
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (02) : 634 - 645
[30] A Unified Approach for Conventional Zero-Shot, Generalized Zero-Shot, and Few-Shot Learning
Rahman, Shafin
Khan, Salman
Porikli, Fatih
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (11) : 5652 - 5667

← 1 2 3 4 5 →