Weakly supervised topic sentiment joint model with word embeddings

被引：31

作者：

Fu, Xianghua ^{[1
]}

Sun, Xudong ^{[1
]}

Wu, Haiying ^{[1
]}

Cui, Laizhong ^{[1
]}

Huang, Joshua Zhexue ^{[1
]}

机构：

[1] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2018年 / 147卷

关键词：

Sentiment analysis; Topic model; Topic sentiment joint model; Word embeddings;

D O I：

10.1016/j.knosys.2018.02.012

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Topic sentiment joint model aims to deal with the problem about the mixture of topics and sentiment simultaneously from online reviews. Most of existing topic sentiment modeling algorithms are mainly based on the state-of-art latent Dirichlet allocation (LDA) and probabilistic latent semantic analysis (PLSA), which infer sentiment and topic distributions from the co-occurrence of words. These methods have been proposed and successfully used for topic and sentiment analysis. However, when the training corpus is small or when the documents are short, the textual features become sparse, so that the results of the sentiment and topic distributions might be not very satisfied. In this paper, we propose a novel topic sentiment joint model called weakly supervised topic sentiment joint model with word embeddings (WS-TSWE), which incorporates word embeddings and HowNet lexicon simultaneously to improve the topic identification and sentiment recognition. The main contributions of WS-TSWE include the following two aspects. (1) Existing models generate the words only from the sentiment-topic-to-word Dirichlet multinomial component, but the WS-TSWE model replaces it with a mixture of two components, a Dirichlet multinomial component and a word embeddings component. Since the word embeddings are trained on a very large corpora and can be used to extend the semantic information of the words, they can provide a certain solution for the problem of the textual sparse. (2) Most of previous models incorporate sentiment knowledge in the beta priors. And the priors are usually set from a dictionary and completely rely on previous domain knowledge to identify positive and negative words. In contrast, the WS-TSWE model calculates the sentiment orientation of each word with the HowNet lexicon and automatically infers sentiment-based beta priors for sentiment analysis and opinion mining. Furthermore, we implement WS-TSWE with Gibbs sampling algorithms. The experimental results on Chinese and English data sets show that WS-TSWE achieved significant performance in the task of detecting sentiment and topics simultaneously. (c) 2018 Elsevier B.V. All rights reserved.

引用

页码：43 / 54

页数：12

共 50 条

[31] A Semi-Supervised Topic Model Incorporating Sentiment and Dynamic Characteristic
Zhang, Lanshan
Ding, Xi
Tian, Ye
Gong, Xiangyang
Wang, Wendong
CHINA COMMUNICATIONS, 2016, 13 (12) : 162 - 175
[32] Refining Word Embeddings Using Intensity Scores for Sentiment Analysis
Yu, Liang-Chih
Wang, Jin
Lai, K. Robert
Zhang, Xuejie
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (03) : 671 - 681
[33] Fine-Tuning of Word Embeddings for Semantic Sentiment Analysis
Atzeni, Mattia
Recupero, Diego Reforgiato
SEMANTIC WEB CHALLENGES, SEMWEBEVAL 2018, 2018, 927 : 140 - 150
[34] Contextual Word Embeddings and Topic Modeling in Healthy Dieting and Obesity
Yeruva, Vijaya Kumari
Junaid, Sidrah
Lee, Yugyung
JOURNAL OF HEALTHCARE INFORMATICS RESEARCH, 2019, 3 (02) : 159 - 183
[35] Enhancing Topic Modeling for Short Texts with Auxiliary Word Embeddings
Li, Chenliang
Duan, Yu
Wang, Haoran
Zhang, Zhiqian
Sun, Aixin
Ma, Zongyang
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2017, 36 (02)
[36] Evaluating Quality of Word Embeddings with Sentiment Polarity Identification Task
Indurthi, Vijayasaradhi
Oota, Subba Reddy
SEMANTIC WEB CHALLENGES, SEMWEBEVAL 2018, 2018, 927 : 232 - 237
[37] Contextual Word Embeddings and Topic Modeling in Healthy Dieting and Obesity
Vijaya Kumari Yeruva
Sidrah Junaid
Yugyung Lee
Journal of Healthcare Informatics Research, 2019, 3 : 159 - 183
[38] A Study on Stochastic Variational Inference for Topic Modeling with Word Embeddings
Ozaki, Kana
Kobayashie, Ichiro
COMPUTACION Y SISTEMAS, 2022, 26 (03): : 1225 - 1232
[39] Learning emotional word embeddings for sentiment analysis
Zeng, Qingtian
Zhao, Xishi
Hu, Xiaohui
Duan, Hua
Zhao, Zhongying
Li, Chao
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (05) : 9515 - 9527
[40] Quality of Word Embeddings on Sentiment Analysis Tasks
Cano, Erion
Morisio, Maurizio
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, NLDB 2017, 2017, 10260 : 332 - 338

← 1 2 3 4 5 →