Sentimental feature selection for sentiment analysis of Chinese online reviews

被引:79
作者
Zheng, Lijuan [1 ,2 ]
Wang, Hongwei [2 ]
Gao, Song [2 ]
机构
[1] Liaocheng Univ, Sch Business, Liaocheng 252000, Peoples R China
[2] Tongji Univ, Sch Econ & Management, Shanghai 200092, Peoples R China
关键词
Online reviews; Sentiment; Feature selection; Statistical machine learning; CLASSIFICATION;
D O I
10.1007/s13042-015-0347-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the growing availability and popularity of online reviews, the sentiment analysis arises in response to the requirement of organizing useful information in speed. Feature selection directly affects the representation of online reviews and brings a lot of challenges to the domain of sentiment analysis. However, little attention has been paid to feature selection of Chinese online reviews so far. Therefore, we are motivated to explore the effects of feature selection on sentiment analysis of Chinese online reviews. Firstly, N-char-grams and N-POS-grams are selected as the potential sentimental features. Then, the improved Document Frequency method is used to select feature subsets, and the Boolean Weighting method is adopted to calculate feature weight. At last, experiments based on online reviews of mobile phone are conducted, and Chi-square test is carried out to test the significance of experimental results. The results suggest that sentiment analysis of Chinese online reviews obtains higher accuracy when taking 4-POS-grams as features. Besides that, low order N-char-grams can achieve a better performance than high order N-char-grams when taking N-char-grams as features. Furthermore, the improved document frequency achieves significant improvement in sentiment analysis of Chinese online reviews.
引用
收藏
页码:75 / 84
页数:10
相关论文
共 31 条
[1]   Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums [J].
Abbasi, Ahmed ;
Chen, Hsinchun ;
Salem, Arab .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2008, 26 (03)
[2]   Affect analysis of web forums and blogs using correlation ensembles [J].
Abbasi, Ahmed ;
Chen, Hsinchun ;
Thoms, Sven ;
Fu, Tianjun .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (09) :1168-1180
[3]  
[Anonymous], 2004, P 2004 C EMP METH NA
[4]  
[Anonymous], 2011, Thesis
[5]  
Cui H., 2006, Proceedings of the AAAI conference on artificial intelligence, V6, P1265
[6]   Examining the relationship between reviews and sales: The role of reviewer identity disclosure in electronic markets [J].
Forman, Chris ;
Ghose, Anindya ;
Wiesenfeld, Batia .
INFORMATION SYSTEMS RESEARCH, 2008, 19 (03) :291-313
[7]   Chinese word segmentation and named entity recognition: A pragmatic approach [J].
Gao, JF ;
Li, M ;
Wu, A ;
Huang, CN .
COMPUTATIONAL LINGUISTICS, 2005, 31 (04) :531-574
[8]   Twitter brand sentiment analysis: A hybrid system using n-gram analysis and dynamic artificial neural network [J].
Ghiassi, M. ;
Skinner, J. ;
Zimbra, D. .
EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (16) :6266-6282
[9]   Harnessing the cloud of patient experience: using social media to detect poor quality healthcare [J].
Greaves, Felix ;
Ramirez-Cano, Daniel ;
Millett, Christopher ;
Darzi, Ara ;
Donaldson, Liam .
BMJ QUALITY & SAFETY, 2013, 22 (03) :251-255
[10]  
Huang C, 1997, APPL LINGUIST, P1