POS-RS: A Random Subspace method for sentiment classification based on part-of-speech analysis

被引:55
作者
Wang, Gang [1 ,2 ,3 ]
Zhang, Zhu [1 ,2 ,4 ]
Sun, Jianshan [1 ,2 ,5 ]
Yang, Shanlin [1 ,2 ]
Larson, Catherine A. [3 ]
机构
[1] Hefei Univ Technol, Sch Management, Hefei 230009, Anhui, Peoples R China
[2] Minist Educ, Key Lab Proc Optimizat & Intelligent Decis Making, Hefei, Anhui, Peoples R China
[3] Univ Arizona, Dept Management Informat Syst, Tucson, AZ 85721 USA
[4] Iowa State Univ, Dept Supply Chain & Informat Syst, Ames, IA 50011 USA
[5] City Univ Hong Kong, Dept Informat Syst, Kowloon, Hong Kong, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Sentiment classification; Random Subspace; Part of speech; Ensemble learning; TEXT; CLASSIFIERS; STRENGTH;
D O I
10.1016/j.ipm.2014.09.004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the rise of Web 2.0 platforms, personal opinions, such as reviews, ratings, recommendations, and other forms of user-generated content, have fueled interest in sentiment classification in both academia and industry. In order to enhance the performance of sentiment classification, ensemble methods have been investigated by previous research and proven to be effective theoretically and empirically. We advance this line of research by proposing an enhanced Random Subspace method, POS-RS, for sentiment classification based on part-of-speech analysis. Unlike existing Random Subspace methods using a single subspace rate to control the diversity of base learners, POS-RS employs two important parameters, i.e. content lexicon subspace rate and function lexicon subspace rate, to control the balance between the accuracy and diversity of base learners. Ten publicly available sentiment data-sets were investigated to verify the effectiveness of proposed method. Empirical results reveal that POS-RS achieves the best performance through reducing bias and variance simultaneously compared to the base learner, i.e., Support Vector Machine. These results illustrate that POS-RS can be used as a viable method for sentiment classification and has the potential of being successfully applied to other text classification problems. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:458 / 479
页数:22
相关论文
共 67 条
[1]  
Abbasi Ahmed, 2007, 2007 IEEE Intelligence and Security Informatics, P282, DOI 10.1109/ISI.2007.379486
[2]   Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums [J].
Abbasi, Ahmed ;
Chen, Hsinchun ;
Salem, Arab .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2008, 26 (03)
[3]   Affect analysis of web forums and blogs using correlation ensembles [J].
Abbasi, Ahmed ;
Chen, Hsinchun ;
Thoms, Sven ;
Fu, Tianjun .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (09) :1168-1180
[4]  
[Anonymous], 2005, Proceedings of HLT/EMNLP on Interactive Demonstrations -
[5]  
[Anonymous], P 20 INT C COMPUTATI, DOI DOI 10.3115/1220355.1220555
[6]   An empirical comparison of voting classification algorithms: Bagging, boosting, and variants [J].
Bauer, E ;
Kohavi, R .
MACHINE LEARNING, 1999, 36 (1-2) :105-139
[7]  
Bin Lu, 2010, 2010 International Conference on Machine Learning and Cybernetics (ICMLC 2010), P3311, DOI 10.1109/ICMLC.2010.5580672
[8]  
Bo P., 2008, Foundations and Trends in Information Retrieval, V2, P1, DOI DOI 10.1561/1500000011
[9]   A machine learning approach to sentiment analysis in multilingual Web texts [J].
Boiy, Erik ;
Moens, Marie-Francine .
INFORMATION RETRIEVAL, 2009, 12 (05) :526-558
[10]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32