Ensemble of feature sets and classification algorithms for sentiment classification

被引:361
作者
Xia, Rui [1 ]
Zong, Chengqing [1 ]
Li, Shoushan [2 ]
机构
[1] Chinese Acad Sci CASIA, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
[2] Soochow Univ, Dept Comp Sci & Technol, Suzhou 215006, Peoples R China
关键词
Sentiment classification; Text classification; Ensemble learning; Classifier combination; Comparative study; COMBINING CLASSIFIERS; TEXT; COMBINATIONS;
D O I
10.1016/j.ins.2010.11.023
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we make a comparative study of the effectiveness of ensemble technique for sentiment classification. The ensemble framework is applied to sentiment classification tasks, with the aim of efficiently integrating different feature sets and classification algorithms to synthesize a more accurate classification procedure. First, two types of feature sets are designed for sentiment classification, namely the part-of-speech based feature sets and the word-relation based feature sets. Second, three well-known text classification algorithms, namely naive Bayes, maximum entropy and support vector machines, are employed as base-classifiers for each of the feature sets. Third, three types of ensemble methods, namely the fixed combination, weighted combination and meta-classifier combination, are evaluated for three ensemble strategies. A wide range of comparative experiments are conducted on five widely-used datasets in sentiment classification. Finally, some in-depth discussion is presented and conclusions are drawn about the effectiveness of ensemble technique for sentiment classification. (C) 2010 Elsevier Inc. All rights reserved.
引用
收藏
页码:1138 / 1152
页数:15
相关论文
共 41 条
[1]  
[Anonymous], 1998, P AAAI WORKSH LEARN
[2]  
[Anonymous], 2006, AAAI 06
[3]  
[Anonymous], 2007, P ASS COMP LING ACL
[4]  
[Anonymous], 2003, Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003-Volume 4, CONLL'03
[5]  
[Anonymous], ADV LARGE MARGIN CLA
[6]  
[Anonymous], 2004, P INT C COMP LING CO
[7]  
[Anonymous], INT C SYST COMP SCI
[8]  
Benamara F., 2007, ICWSM
[9]  
Bo Pang, 2008, Foundations and Trends in Information Retrieval, V2, P1, DOI 10.1561/1500000001
[10]  
Bridle J., 1990, NEUROCOMPUTING, V227, P236