Text feature selection for sentiment classification of Chinese online reviews

被引:22
|
作者
Wang, Hongwei [1 ]
Yin, Pei [1 ]
Yao, Jiani [1 ]
Liu, James N. K. [2 ]
机构
[1] Tongji Univ, Sch Econ & Management, Shanghai 200092, Peoples R China
[2] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Hong Kong, Peoples R China
关键词
feature selection method; text classification; sentiment classification; Chinese online reviews;
D O I
10.1080/0952813X.2012.721139
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In order to meet the requirement of customised services for online communities, sentiment classification of online reviews has been applied to study the unstructured reviews so as to identify users' opinions on certain products. The purpose of this article is to select features for sentiment classification of Chinese online reviews with techniques well performed in traditional text classification. First, adjectives, adverbs and verbs are identified as the potential text features containing sentiment information. Then, four statistical feature selection methods, such as document frequency (DF), information gain (IG), chi-squared statistic (CHI) and mutual information (MI), are adopted to select features. After that, the Boolean weighting method is applied to set feature weights and construct a vector space model. Finally, a support vector machine (SVM) classifier is employed to predict the sentiment polarity of online reviews. Comparative experiments are conducted based on hotel online reviews in Chinese. The results indicate that the highest accuracy of the sentiment classification of Chinese online reviews is achieved by taking adjectives, adverbs and verbs together as the feature. Besides that, different feature selection methods make distinct performances on sentiment classification, as DF performs the best, CHI follows and IG ranks the last, whereas MI is not suitable for sentiment classification of Chinese online reviews. This conclusion will be helpful to improve the accuracy of sentiment classification and be useful for further research.
引用
收藏
页码:425 / 439
页数:15
相关论文
共 50 条
  • [31] SENTIMENT CLASSIFICATION ON CHINESE REVIEWS BASED ON AMBIGUOUS SENTIMENT CONFINED LIBRARY
    Liu, Meijuan
    Yang, Shicai
    Chen, Qiaofen
    2012 IEEE 2nd International Conference on Cloud Computing and Intelligent Systems (CCIS) Vols 1-3, 2012, : 1470 - 1473
  • [32] ONLINE FEATURE SELECTION AND CLASSIFICATION
    Kalkan, Habil
    Cetisli, Bayram
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 2124 - 2127
  • [33] Feature-Based Sentiment Analysis in Online Arabic Reviews
    Abd-Elhamid, Laila
    Elzanfaly, Doaa
    Eldin, Ahmed Sharaf
    PROCEEDINGS OF 2016 11TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2016, : 260 - 265
  • [34] An Empirical Study of Unsupervised Sentiment Classification of Chinese Reviews
    翟忠武
    徐华
    贾培发
    Tsinghua Science and Technology, 2010, 15 (06) : 702 - 708
  • [35] Improved Feature Based Sentiment Analysis for Online Customer Reviews
    Rasikannan, L.
    Alli, P.
    Ramanujam, E.
    INNOVATIVE DATA COMMUNICATION TECHNOLOGIES AND APPLICATION, 2020, 46 : 148 - 155
  • [36] An empirical study of unsupervised sentiment classification of chinese reviews
    Zhai Z.
    Xu H.
    Jia P.
    Tsinghua Science and Technology, 2010, 15 (06) : 702 - 708
  • [37] Optimizing feature selection techniques for sentiment classification
    Uribe, Diego
    2011 IEEE ELECTRONICS, ROBOTICS AND AUTOMOTIVE MECHANICS CONFERENCE (CERMA 2011), 2011, : 103 - 107
  • [38] Enhanced Sentiment Classification for Informal Myanmar Text of Restaurant Reviews
    Aye, Yu Mon
    Aung, Sint Sint
    2018 IEEE/ACIS 16TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING RESEARCH, MANAGEMENT AND APPLICATION (SERA), 2018, : 31 - 36
  • [39] Dynamic feature selection in text classification
    Doan, Son
    Horiguchi, Susumu
    INTELLIGENT CONTROL AND AUTOMATION, 2006, 344 : 664 - 675
  • [40] Contextual feature selection for text classification
    Paradis, Francois
    Nie, Jian-Yun
    INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (02) : 344 - 352