Robust Multimodal Sentiment Analysis via Tag Encoding of Uncertain Missing Modalities

被引:15
作者
Zeng, Jiandian [1 ,2 ]
Zhou, Jiantao [1 ,2 ]
Liu, Tianyi [3 ]
机构
[1] Univ Macau, State Key Lab Internet Things Smart City, Macau 999078, Peoples R China
[2] Univ Macau, Dept Comp & Informat Sci, Macau 999078, Peoples R China
[3] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
关键词
Sentiment analysis; Feature extraction; Transformers; Visualization; Acoustics; Encoding; Training; Multimodal sentiment analysis; missing modality; joint representation;
D O I
10.1109/TMM.2022.3207572
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multimodal sentiment analysis aims to extract emotions with multiple data sources, usually under the assumption that all modalities are available. In practice, such a strong assumption does not always hold, and most of multimodal sentiment analysis methods may fail when partial modalities are missing. Some existing works have started to address the missing modality problem; but only considered the singlemodalitymissing case, while ignoring the practically more general cases of multiple modalities missing. To this end, in this paper, we propose a Tag-Assisted Transformer Encoder (TATE) network to handle the problem of missing uncertain modalities. Specifically, we design a tag encoding module to cover both the single modality and multiple modalities missing cases, so as to guide the network's attention to those missing modalities. Besides, a new space projection pattern is adopted to align common vectors, taking into account the different importance of each modality. Afterwards, a Transformer encoderdecoder network is utilized to learn the missing modality features, and the outputs of the Transformer encoder are extracted for the final sentiment classification. Extensive experiments and analyses are conducted on CMU-MOSI, IEMOCAP, and MELD datasets, which show that the proposed method can achieve significant improvements compared with several baselines.
引用
收藏
页码:6301 / 6314
页数:14
相关论文
共 49 条
[1]  
Akbari H., 2021, Proc. Neural Inf. Process. Syst., V34, P1
[2]  
Baldi P., 2012, AUTOENCODERS UNSUPER
[3]   OpenFace 2.0: Facial Behavior Analysis Toolkit [J].
Baltrusaitis, Tadas ;
Zadeh, Amir ;
Lim, Yao Chong ;
Morency, Louis-Philippe .
PROCEEDINGS 2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2018), 2018, :59-66
[4]   IEMOCAP: interactive emotional dyadic motion capture database [J].
Busso, Carlos ;
Bulut, Murtaza ;
Lee, Chi-Chun ;
Kazemzadeh, Abe ;
Mower, Emily ;
Kim, Samuel ;
Chang, Jeannette N. ;
Lee, Sungbok ;
Narayanan, Shrikanth S. .
LANGUAGE RESOURCES AND EVALUATION, 2008, 42 (04) :335-359
[5]   Deep Adversarial Learning for Multi-Modality Missing Data Completion [J].
Cai, Lei ;
Wang, Zhengyang ;
Gao, Hongyang ;
Shen, Dinggang ;
Ji, Shuiwang .
KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, :1158-1166
[6]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[7]   Semi-supervised Deep Generative Modelling of Incomplete Multi-Modality Emotional Data [J].
Du, Changde ;
Du, Changying ;
Wang, Hao ;
Li, Jinpeng ;
Zheng, Wei-Long ;
Lu, Bao-Liang ;
He, Huiguang .
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, :108-116
[8]  
Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
[9]   Online Early-Late Fusion Based on Adaptive HMM for Sign Language Recognition [J].
Guo, Dan ;
Zhou, Wengang ;
Li, Houqiang ;
Wang, Meng .
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2018, 14 (01)
[10]   LD-MAN: Layout-Driven Multimodal Attention Network for Online News Sentiment Recognition [J].
Guo, Wenya ;
Zhang, Ying ;
Cai, Xiangrui ;
Meng, Lei ;
Yang, Jufeng ;
Yuan, Xiaojie .
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 :1785-1798