Exploring Semantic Relations for Social Media Sentiment Analysis

被引：7

作者：

Zeng, Jiandian ^{[1
,2
]}

Zhou, Jiantao ^{[3
,4
]}

Huang, Caishi ^{[3
,4
]}

机构：

[1] Beijing Normal Univ, Inst Artificial Intelligence & Future Networks, Zhuhai, Peoples R China

[2] Univ Macau, Macau 999078, Peoples R China

[3] Univ Macau, State Key Lab Internet Things Smart City, Macau 999078, Peoples R China

[4] Univ Macau, Dept Comp & Informat Sci, Macau 999078, Peoples R China

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2023年 / 31卷

关键词：

Multimodal fusion; sentiment analysis; social media; MULTIMODAL SENTIMENT; FUSION; LSTM;

D O I：

10.1109/TASLP.2023.3285238

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

With the massive social media data available online, the conventional single modality emotion classification has developed into more complex models of multimodal sentiment analysis. Most existing works simply extracted image features at a coarse level, resulting in the absence of partially detailed visual features. Besides, social media data usually contain multiple images, while existing works considered a single image case and used only one image for representing visual features. In fact, it is nontrivial to extend the single image case to the multiple images case, due to the complex relations among multiple images. To solve the above issues, in this article, we propose a Gated Fusion Semantic Relation (GFSR) network to explore semantic relations for social media sentiment analysis. In addition to inter-relations between visual and textual modalities, we also exploit intra-relations among multiple images, potentially improving the sentiment analysis performance. Specifically, we design a gated fusion network to fuse global image embeddings and the corresponding local Adjective Noun Pair (ANP) embeddings. Then, apart from textual relations and cross-modal relations, we employ the multi-head cross attention mechanism between images and ANPs to capture similar semantic contents. Eventually, the updated textual and visual representations are concatenated for the final sentiment prediction. Extensive experiments are conducted on real-world Yelp and Flickr30 k datasets, showing that our GFSR can improve about 0.10% to 3.66% in terms of accuracy on the Yelp dataset with multiple images, and achieve the best accuracy for two classes and the best macro F1 for three classes on the Flickr30 k dataset with a single image.

引用

页码：2382 / 2394

页数：13

共 57 条

[1]

Borth D., 2013, P ACM INT C MULT, P223, DOI 10.1145/2502081.2502282

[2] Benchmarking Multimodal Sentiment Analysis [J].

Cambria, Erik ;

Hazarika, Devamanyu ;

Poria, Soujanya ;

Hussain, Amir ;

Subramanyam, R. B. V. .

COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2017, PT II, 2018, 10762 :166-179

[3] Multimodal Emotion Recognition With Temporal and Semantic Consistency [J].

Chen, Bingzhi ;

Cao, Qi ;

Hou, Mixiao ;

Zhang, Zheng ;

Lu, Guangming ;

Zhang, David .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 :3592-3603

[4]

Chen F, 2015, 2015 IEEE 6TH INTERNATIONAL SYMPOSIUM ON MICROWAVE, ANTENNA, PROPAGATION, AND EMC TECHNOLOGIES (MAPE), P1, DOI 10.1109/MAPE.2015.7510253

[5] Pre-Trained Image Processing Transformer [J].

Chen, Hanting ;

Wang, Yunhe ;

Guo, Tianyu ;

Xu, Chang ;

Deng, Yiping ;

Liu, Zhenhua ;

Ma, Siwei ;

Xu, Chunjing ;

Xu, Chao ;

Gao, Wen .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :12294-12305

[6]

Chen MH, 2017, PROCEEDINGS OF THE 19TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2017, P163, DOI 10.1145/3136755.3136801

[7] Sparse Self-Attention LSTM for Sentiment Lexicon Construction [J].

Deng, Dong ;

Jing, Liping ;

Yu, Jian ;

Sun, Shaolong .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) :1777-1790

[8]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[9]

Dou ZY, 2017, P 2017 C EMP METH NA, P521, DOI [10.18653/v1/D17-1054, DOI 10.18653/V1/D17-1054]

[10] Online Early-Late Fusion Based on Adaptive HMM for Sign Language Recognition [J].

Guo, Dan ;

Zhou, Wengang ;

Li, Houqiang ;

Wang, Meng .

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2018, 14 (01)

← 1 2 3 4 5 6 →