Cross-modal dynamic convolution for multi-modal emotion recognition

被引：11

作者：

Wen, Huanglu ^{[1
]}

You, Shaodi ^{[2
]}

Fu, Ying ^{[1
]}

机构：

[1] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing Lab Intelligent Informat Technol, Beijing, Peoples R China

[2] Univ Amsterdam, Inst Informat, Comp Vis Res Grp, Amsterdam, Netherlands

来源：

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION | 2021年 / 78卷

基金：

中国国家自然科学基金;

关键词：

Artificial neural networks; Pattern; recognition Affective behavior; Multi-modal temporal sequences; BODY; REPRESENTATIONS; NETWORK; FUSION; FACE;

D O I：

10.1016/j.jvcir.2021.103178

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Understanding human emotions requires information from different modalities like vocal, visual, and verbal. Since human emotion is time-varying, the related information is usually represented as temporal sequences and we need to identify both emotion-related clues and their cross-modal interactions inside. However, emotion related clues are sparse and misaligned in temporally unaligned sequences, making it hard for previous multi-modal emotion recognition methods to catch helpful cross-modal interactions. To this end, we present cross-modal dynamic convolution. To deal with sparsity, cross-modal dynamic convolution models the temporal dimension locally to avoid being overwhelmed by unrelated information. Cross-modal dynamic convolution is easy to stack, enabling it to model long-range cross-modal temporal interactions. Besides, models with cross modal dynamic convolution are more stable during training than with cross-modal attention, bringing more possibilities in multi-modal sequential model designing. Extensive experiments show that our method can achieve competitive performance compared to previous works while being more efficient.

引用

页数：10

共 50 条

[1] Contextual and Cross-Modal Interaction for Multi-Modal Speech Emotion Recognition
Yang, Dingkang
Huang, Shuai
Liu, Yang
Zhang, Lihua
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2093 - 2097
[2] Semi-supervised Multi-modal Emotion Recognition with Cross-Modal Distribution Matching
Liang, Jingjun
Li, Ruichen
Jin, Qin
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2852 - 2861
[3] Cross-modal context-gated convolution for multi-modal sentiment analysis
Wen, Huanglu
You, Shaodi
Fu, Ying
PATTERN RECOGNITION LETTERS, 2021, 146 : 252 - 259
[4] Multi-modal Subspace Learning with Dropout regularization for Cross-modal Recognition and Retrieval
Cao, Guanqun
Waris, Muhammad Adeel
Iosifidis, Alexandros
Gabbouj, Moncef
2016 SIXTH INTERNATIONAL CONFERENCE ON IMAGE PROCESSING THEORY, TOOLS AND APPLICATIONS (IPTA), 2016,
[5] Cross-Modal Diversity-Based Active Learning for Multi-Modal Emotion Estimation
Xu, Yifan
Meng, Lubin
Peng, Ruimin
Yin, Yingjie
Ding, Jingting
Li, Liang
Wu, Dongrui
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[6] Cross-modal attention for multi-modal image registration
Song, Xinrui
Chao, Hanqing
Xu, Xuanang
Guo, Hengtao
Xu, Sheng
Turkbey, Baris
Wood, Bradford J.
Sanford, Thomas
Wang, Ge
Yan, Pingkun
MEDICAL IMAGE ANALYSIS, 2022, 82
[7] Multi-modal and cross-modal for lecture videos retrieval
Nhu Van Nguyen
Coustaty, Mickal
Ogier, Jean-Marc
2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 2667 - 2672
[8] Unsupervised Multi-modal Hashing for Cross-Modal Retrieval
Jun Yu
Xiao-Jun Wu
Donglin Zhang
Cognitive Computation, 2022, 14 : 1159 - 1171
[9] Unsupervised Multi-modal Hashing for Cross-Modal Retrieval
Yu, Jun
Wu, Xiao-Jun
Zhang, Donglin
COGNITIVE COMPUTATION, 2022, 14 (03) : 1159 - 1171
[10] Cross-Modal Retrieval Augmentation for Multi-Modal Classification
Gur, Shir
Neverova, Natalia
Stauffer, Chris
Lim, Ser-Nam
Kiela, Douwe
Reiter, Austin
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 111 - 123

← 1 2 3 4 5 →