A Versatile Multimodal Learning Framework for Zero-Shot Emotion Recognition

被引：1

作者：

Qi, Fan ^{[1
]}

Zhang, Huaiwen ^{[2
,3
]}

Yang, Xiaoshan ^{[4
,5
,6
]}

Xu, Changsheng ^{[4
,5
,6
]}

机构：

[1] Tianjin Univ Technol, Sch Comp Sci & Engn, Tianjin 300384, Peoples R China

[2] Inner Mongolia Univ, Coll Comp Sci, Hohhot 010021, Peoples R China

[3] Natl & Local Joint Engn Res Ctr Intelligent Infor, Hohhot 010021, Peoples R China

[4] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence, Beijing 100190, Peoples R China

[5] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100190, Peoples R China

[6] Peng Cheng Lab, Shenzhen 518055, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Multimodal emotion recognition; zero-shot learning; transformer; NETWORKS; MODEL;

D O I：

10.1109/TCSVT.2024.3362270

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Multi-modal Emotion Recognition (MER) aims to identify various human emotions from heterogeneous modalities. With the development of emotional theories, there are more and more novel and fine-grained concepts to describe human emotional feelings. Real-world recognition systems often encounter unseen emotion labels. To address this challenge, we propose a versatile zero-shot MER framework to refine emotion label embeddings for capturing inter-label relationships and improving discrimination between labels. We integrate prior knowledge into a novel affective graph space that generates tailored label embeddings capturing inter-label relationships. To obtain multimodal representations, we disentangle the features of each modality into egocentric and altruistic components using adversarial learning. These components are then hierarchically fused using a hybrid co-attention mechanism. Furthermore, an emotion-guided decoder exploits label-modal dependencies to generate adaptive multimodal representations guided by emotion embeddings. We conduct extensive experiments with different multimodal combinations, including visual-acoustic and visual-textual inputs, on four datasets in both single-label and multi-label zero-shot settings. Results demonstrate the superiority of our proposed framework over state-of-the-art methods.

引用

页码：5728 / 5741

页数：14

共 50 条

[31] Zero-Shot Learning Based on Deep Weighted Attribute Prediction
Wang, Xuesong
Chen, Chen
Cheng, Yuhu
Chen, Xun
Liu, Yu
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2020, 50 (08): : 2948 - 2957
[32] Hybrid routing transformer for zero-shot learning
Cheng, De
Wang, Gerong
Wang, Bo
Zhang, Qiang
Han, Jungong
Zhang, Dingwen
PATTERN RECOGNITION, 2023, 137
[33] Zero-Shot Learning for Computer Vision Applications
Sarma, Sandipan
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9360 - 9364
[34] Chart question answering with multimodal graph representation learning and zero-shot classification
Farahani, Ali Mazraeh
Adibi, Peyman
Ehsani, Mohammad Saeed
Hutter, Hans-Peter
Darvishy, Alireza
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 270
[35] Zero-Shot Learning for Skeleton-based Classroom Action Recognition
Shi, Bin
Wang, Luyang
Yu, Zefang
Xiang, Suncheng
Liu, Ting
Fu, Yuzhuo
2021 INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND INTELLIGENT CONTROLS (ISCSIC 2021), 2021, : 82 - 86
[36] Group-wise interactive region learning for zero-shot recognition
Guo, Ting
Liang, Jiye
Xie, Guo-Sen
INFORMATION SCIENCES, 2023, 642
[37] SR2CNN: Zero-Shot Learning for Signal Recognition
Dong, Yihong
Jiang, Xiaohan
Zhou, Huaji
Lin, Yun
Shi, Qingjiang
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 : 2316 - 2329
[38] Haptic Zero-Shot Learning: Recognition of objects never touched before
Abderrahmane, Zineb
Ganesh, Gowrishankar
Crosnier, Andre
Cherubini, Andrea
ROBOTICS AND AUTONOMOUS SYSTEMS, 2018, 105 : 11 - 25
[39] Learning Class Prototypes via Structure Alignment for Zero-Shot Recognition
Jiang, Huajie
Wang, Ruiping
Shan, Shiguang
Chen, Xilin
COMPUTER VISION - ECCV 2018, PT X, 2018, 11214 : 121 - 138
[40] Zero-Shot Learning on Human-Object Interaction Recognition in video
Maraghi, Vali Ollah
Faez, Karim
2019 5TH IRANIAN CONFERENCE ON SIGNAL PROCESSING AND INTELLIGENT SYSTEMS (ICSPIS 2019), 2019,

← 1 2 3 4 5 →