Multimodal sentiment analysis based on fusion methods: A survey

被引:141
作者
Zhu, Linan [1 ]
Zhu, Zhechao [1 ]
Zhang, Chenwei [2 ]
Xu, Yifei [1 ]
Kong, Xiangjie [1 ]
机构
[1] Zhejiang Univ Technol, Coll Comp Sci & Technol, Hangzhou, Peoples R China
[2] Univ Hong Kong, Sch Fac Educ, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal data; Sentiment analysis; Feature extraction; Fusion methods; NETWORK;
D O I
10.1016/j.inffus.2023.02.028
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sentiment analysis is an emerging technology that aims to explore people's attitudes toward an entity. It can be applied in a variety of different fields and scenarios, such as product review analysis, public opinion analysis, psychological disease analysis, and risk assessment analysis. Traditional sentiment analysis only includes the text modality and extracts sentiment information by inferring the semantic relationship within sentences. However, some special expressions, such as irony and exaggeration, are difficult to detect via text alone. Multimodal sentiment analysis contains rich visual and acoustic information in addition to text, and uses fusion analysis to more accurately infer the implied sentiment polarity (positive, neutral, negative). The main challenge in multimodal sentiment analysis is the integration of cross-modal sentiment information, so we focus on introducing the framework and characteristics of different fusion methods. In addition, this article discusses the development status of multimodal sentiment analysis, popular datasets, feature extraction algorithms, application areas, and existing challenges. It is hoped that our work can help researchers understand the current state of research in the field of multimodal sentiment analysis, and be inspired by the useful insights provided in the article to develop effective models.
引用
收藏
页码:306 / 325
页数:20
相关论文
共 107 条
[41]   Quantum-inspired multimodal fusion for video sentiment analysis [J].
Li, Qiuchi ;
Gkoumas, Dimitris ;
Lioma, Christina ;
Melucci, Massimo .
INFORMATION FUSION, 2021, 65 :58-71
[42]  
Li Z., 2022, P 29 INT C COMP LING, P7136
[43]  
Liang PP, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P150
[44]  
Liang PP, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P1569
[45]  
Lin Z., 2022, P 29 INT C COMP LING, P7124
[46]   The Mythos of Model Interpretability [J].
Lipton, Zachary C. .
COMMUNICATIONS OF THE ACM, 2018, 61 (10) :36-43
[47]  
Littlewort G., 2011, Proceedings 2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG 2011), P298, DOI 10.1109/FG.2011.5771414
[48]  
Liu B, 2011, DATA CENTRIC SYST AP, P459, DOI 10.1007/978-3-642-19460-3_11
[49]  
Liu Z, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P2247
[50]   Multi-source aggregated classification for stock price movement prediction [J].
Ma, Yu ;
Mao, Rui ;
Lin, Qika ;
Wu, Peng ;
Cambria, Erik .
INFORMATION FUSION, 2023, 91 :515-528