Sentimental feature selection for sentiment analysis of Chinese online reviews

被引:0
作者
Lijuan Zheng
Hongwei Wang
Song Gao
机构
[1] Liaocheng University,School of Business
[2] Tongji University,School of Economics and Management
来源
International Journal of Machine Learning and Cybernetics | 2018年 / 9卷
关键词
Online reviews; Sentiment; Feature selection; Statistical machine learning;
D O I
暂无
中图分类号
学科分类号
摘要
With the growing availability and popularity of online reviews, the sentiment analysis arises in response to the requirement of organizing useful information in speed. Feature selection directly affects the representation of online reviews and brings a lot of challenges to the domain of sentiment analysis. However, little attention has been paid to feature selection of Chinese online reviews so far. Therefore, we are motivated to explore the effects of feature selection on sentiment analysis of Chinese online reviews. Firstly, N-char-grams and N-POS-grams are selected as the potential sentimental features. Then, the improved Document Frequency method is used to select feature subsets, and the Boolean Weighting method is adopted to calculate feature weight. At last, experiments based on online reviews of mobile phone are conducted, and Chi-square test is carried out to test the significance of experimental results. The results suggest that sentiment analysis of Chinese online reviews obtains higher accuracy when taking 4-POS-grams as features. Besides that, low order N-char-grams can achieve a better performance than high order N-char-grams when taking N-char-grams as features. Furthermore, the improved document frequency achieves significant improvement in sentiment analysis of Chinese online reviews.
引用
收藏
页码:75 / 84
页数:9
相关论文
共 60 条
  • [1] Li X(2014)News impact on stock price return via sentiment analysis Knowl Based Syst 69 14-23
  • [2] Xie H(2008)Examining the relationship between reviews and sales: the role of reviewer identity disclosure in electronic markets Inf Syst Res 19 291-313
  • [3] Chen L(2013)Harnessing the cloud of patient experience: using social media to detect poor quality healthcare BMJ Qual Saf 22 251-255
  • [4] Wang J(2012)An enhanced dynamic hash trie algorithm for lexicon search Enterpr Inf Syst 6 419-432
  • [5] Deng X(2003)Feature space theory in data mining: transformations between extensions and intensions in knowledge representation Expert Syst 20 60-71
  • [6] Forman C(2014)Sparse group LASSO based uncertain feature selection Int J Mach Learn Cybern 5 201-210
  • [7] Ghose A(2013)A variational bayesian framework for group feature selection Int J Mach Learn Cybern 4 609-619
  • [8] Wiesenfeld B(2014)Comparative analysis on margin based feature selection algorithms Int J Mach Learn Cybern 5 339-367
  • [9] Greaves F(2008)Sentiment analysis in multiple languages: feature selection for opinion classification in web forums ACM Trans Inf Syst (TOIS) 26 12-21
  • [10] Ramirez D(2005)Chinese word segmentation and named entity recognition: a pragmatic approach Comput Linguist 31 531-574