Learning speaker-independent multimodal representation for sentiment analysis

被引:21
作者
Wang, Jianwen [1 ,2 ]
Wang, Shiping [1 ,4 ]
Lin, Mingwei [2 ,5 ]
Xu, Zeshui [3 ]
Guo, Wenzhong [1 ,4 ]
机构
[1] Fuzhou Univ, Coll Comp & Data Sci, Fuzhou 350116, Peoples R China
[2] Fujian Normal Univ, Coll Comp & Cyber Secur, Fuzhou 350117, Peoples R China
[3] Sichuan Univ, Business Sch, Chengdu 610064, Sichuan, Peoples R China
[4] Fuzhou Univ, Key Lab Network Comp & Intelligent Informat Proc, Fuzhou 350116, Peoples R China
[5] Fujian Normal Univ, Digital Fujian Inst Big Data Secur Technol, Fuzhou 350117, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal fusion; Multimodal sentiment analysis; Multi-view learning; Multimodal representation learning;
D O I
10.1016/j.ins.2023.01.116
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multimodal sentiment analysis is an actively growing research area that utilizes language, acoustic and visual signals to predict sentiment inclination. Compared to language, acoustic and visual features carry a more evident personal style which may degrade the model generalization capability. The issue will be exacerbated in a speaker-independent setting, where the model will encounter samples from unseen speakers during the testing stage. To mitigate personal style's impact, we propose a framework named SIMR for learning speaker-independent multimodal representation. This framework separates the nonverbal inputs into style encoding and content representation with the aid of informative cross-modal correlations. Besides, in terms of integrating cross-modal complementary information, the classical transformer-based approaches are inherently inclined to discover compatible cross-modal interactions but ignore incompatible ones. In contrast, we suggest simultaneously locating both through an enhanced cross-modal transformer module. Experimental results show that the proposed model achieves state-of-the-art performance on several datasets.
引用
收藏
页码:208 / 225
页数:18
相关论文
共 50 条
[1]  
Alam F., 2014, P 2014 ACM MULT WORK, P15
[2]  
Cuturi Marco., 2013, Advances in Neural Information Processing Systems, V26, P2292
[3]  
Degottex G, 2014, INT CONF ACOUST SPEE, DOI 10.1109/ICASSP.2014.6853739
[4]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[5]   FELT, FALSE, AND MISERABLE SMILES [J].
EKMAN, P ;
FRIESEN, WV .
JOURNAL OF NONVERBAL BEHAVIOR, 1982, 6 (04) :238-252
[6]  
Feydy Jean, 2019, PR MACH LEARN RES, V89
[7]  
Fu Z., 2022, P 2022 IEEE INT C MU, P1
[8]   Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions [J].
Gandhi, Ankita ;
Adhvaryu, Kinjal ;
Poria, Soujanya ;
Cambria, Erik ;
Hussain, Amir .
INFORMATION FUSION, 2023, 91 :424-444
[9]  
Ghosal D., 2019, arXiv, DOI [10.48550/arXiv.1908.11540, DOI 10.48550/ARXIV.1908.11540]
[10]  
Gu Y, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P2225