Learning speaker-independent multimodal representation for sentiment analysis

被引：21

作者：

Wang, Jianwen ^{[1
,2
]}

Wang, Shiping ^{[1
,4
]}

Lin, Mingwei ^{[2
,5
]}

Xu, Zeshui ^{[3
]}

Guo, Wenzhong ^{[1
,4
]}

机构：

[1] Fuzhou Univ, Coll Comp & Data Sci, Fuzhou 350116, Peoples R China

[2] Fujian Normal Univ, Coll Comp & Cyber Secur, Fuzhou 350117, Peoples R China

[3] Sichuan Univ, Business Sch, Chengdu 610064, Sichuan, Peoples R China

[4] Fuzhou Univ, Key Lab Network Comp & Intelligent Informat Proc, Fuzhou 350116, Peoples R China

[5] Fujian Normal Univ, Digital Fujian Inst Big Data Secur Technol, Fuzhou 350117, Peoples R China

来源：

INFORMATION SCIENCES | 2023年 / 628卷

基金：

中国国家自然科学基金;

关键词：

Multimodal fusion; Multimodal sentiment analysis; Multi-view learning; Multimodal representation learning;

D O I：

10.1016/j.ins.2023.01.116

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multimodal sentiment analysis is an actively growing research area that utilizes language, acoustic and visual signals to predict sentiment inclination. Compared to language, acoustic and visual features carry a more evident personal style which may degrade the model generalization capability. The issue will be exacerbated in a speaker-independent setting, where the model will encounter samples from unseen speakers during the testing stage. To mitigate personal style's impact, we propose a framework named SIMR for learning speaker-independent multimodal representation. This framework separates the nonverbal inputs into style encoding and content representation with the aid of informative cross-modal correlations. Besides, in terms of integrating cross-modal complementary information, the classical transformer-based approaches are inherently inclined to discover compatible cross-modal interactions but ignore incompatible ones. In contrast, we suggest simultaneously locating both through an enhanced cross-modal transformer module. Experimental results show that the proposed model achieves state-of-the-art performance on several datasets.

引用

页码：208 / 225

页数：18

共 50 条

[1]

Alam F., 2014, P 2014 ACM MULT WORK, P15

[2]

Cuturi Marco., 2013, Advances in Neural Information Processing Systems, V26, P2292

[3]

Degottex G, 2014, INT CONF ACOUST SPEE, DOI 10.1109/ICASSP.2014.6853739

[4]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[5] FELT, FALSE, AND MISERABLE SMILES [J].

EKMAN, P ;

FRIESEN, WV .

JOURNAL OF NONVERBAL BEHAVIOR, 1982, 6 (04) :238-252

[6]

Feydy Jean, 2019, PR MACH LEARN RES, V89

[7]

Fu Z., 2022, P 2022 IEEE INT C MU, P1

[8] Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions [J].

Gandhi, Ankita ;

Adhvaryu, Kinjal ;

Poria, Soujanya ;

Cambria, Erik ;

Hussain, Amir .

INFORMATION FUSION, 2023, 91 :424-444

[9]

Ghosal D., 2019, arXiv, DOI [10.48550/arXiv.1908.11540, DOI 10.48550/ARXIV.1908.11540]

[10]

Gu Y, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P2225

← 1 2 3 4 5 →