A Versatile Multimodal Learning Framework for Zero-Shot Emotion Recognition

被引：1

作者：

Qi, Fan ^{[1
]}

Zhang, Huaiwen ^{[2
,3
]}

Yang, Xiaoshan ^{[4
,5
,6
]}

Xu, Changsheng ^{[4
,5
,6
]}

机构：

[1] Tianjin Univ Technol, Sch Comp Sci & Engn, Tianjin 300384, Peoples R China

[2] Inner Mongolia Univ, Coll Comp Sci, Hohhot 010021, Peoples R China

[3] Natl & Local Joint Engn Res Ctr Intelligent Infor, Hohhot 010021, Peoples R China

[4] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence, Beijing 100190, Peoples R China

[5] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100190, Peoples R China

[6] Peng Cheng Lab, Shenzhen 518055, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Multimodal emotion recognition; zero-shot learning; transformer; NETWORKS; MODEL;

D O I：

10.1109/TCSVT.2024.3362270

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Multi-modal Emotion Recognition (MER) aims to identify various human emotions from heterogeneous modalities. With the development of emotional theories, there are more and more novel and fine-grained concepts to describe human emotional feelings. Real-world recognition systems often encounter unseen emotion labels. To address this challenge, we propose a versatile zero-shot MER framework to refine emotion label embeddings for capturing inter-label relationships and improving discrimination between labels. We integrate prior knowledge into a novel affective graph space that generates tailored label embeddings capturing inter-label relationships. To obtain multimodal representations, we disentangle the features of each modality into egocentric and altruistic components using adversarial learning. These components are then hierarchically fused using a hybrid co-attention mechanism. Furthermore, an emotion-guided decoder exploits label-modal dependencies to generate adaptive multimodal representations guided by emotion embeddings. We conduct extensive experiments with different multimodal combinations, including visual-acoustic and visual-textual inputs, on four datasets in both single-label and multi-label zero-shot settings. Results demonstrate the superiority of our proposed framework over state-of-the-art methods.

引用

页码：5728 / 5741

页数：14

共 50 条

[1] Multimodal zero-shot learning for tactile texture recognition ☆
Cao, Guanqun
Jiang, Jiaqi
Bollegala, Danushka
Li, Min
Luo, Shan
ROBOTICS AND AUTONOMOUS SYSTEMS, 2024, 176
[2] Autonomous Emotion Learning in Speech: A View of Zero-Shot Speech Emotion Recognition
Xu, Xinzhou
Deng, Jun
Cummins, Nicholas
Zhang, Zixing
Zhao, Li
Schuller, Bjorn W.
INTERSPEECH 2019, 2019, : 949 - 953
[3] A review on multimodal zero-shot learning
Cao, Weipeng
Wu, Yuhao
Sun, Yixuan
Zhang, Haigang
Ren, Jin
Gu, Dujuan
Wang, Xingkai
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2023, 13 (02)
[4] Zero-Shot Visual Emotion Recognition by Exploiting BERT
Kang, Hyunwook
Hazarika, Devamanyu
Kim, Dongho
Kim, Jihie
INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 2, 2023, 543 : 485 - 494
[5] An Adversarial Learning Framework for Zero-shot Fault Recognition of Mechanical Systems
Chen, Jinglong
Pan, Tongyang
Zhou, Zitong
He, Shuilong
2019 IEEE 17TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2019, : 1275 - 1278
[6] ZeroEVNet: A multimodal zero-shot learning framework for scalable emergency vehicle detection
Ravi, Reeta
Kanniappan, Jayashree
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 275
[7] Zero-shot Video Emotion Recognition via Multimodal Protagonist-aware Transformer Network
Qi, Fan
Yang, Xiaoshan
Xu, Changsheng
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1074 - 1083
[8] Integrative zero-shot learning for fruit recognition
Tran-Anh, Dat
Huu, Quynh Nguyen
Bui-Quoc, Bao
Hoang, Ngan Dao
Quoc, Tao Ngo
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (29) : 73191 - 73213
[9] Kernelized distance learning for zero-shot recognition
Zarei, Mohammad Reza
Taheri, Mohammad
Long, Yang
INFORMATION SCIENCES, 2021, 580 : 801 - 818
[10] An Attribute Learning Method for Zero-Shot Recognition
Yazdanian, Ramtin
Shojaee, Seyed Mohsen
Baghshah, Mahdieh Soleymani
2017 25TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2017, : 2235 - 2240

← 1 2 3 4 5 →