Hierarchical classification in text mining for sentiment analysis of online news

被引:39
作者
Li, Jinyan [1 ]
Fong, Simon [1 ]
Zhuang, Yan [1 ]
Khoury, Richard [2 ]
机构
[1] Univ Macau, Dept Comp & Informat Sci, Taipa, Macau Sar, Peoples R China
[2] Lakehead Univ, Dept Software Engn, Thunder Bay, ON, Canada
关键词
Sentiment analysis; Text mining; Classification;
D O I
10.1007/s00500-015-1812-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sentiment analysis in text mining is a challenging task. Sentiment is subtly reflected by the tone and affective content of a writer's words. Conventional text mining techniques, which are based on keyword frequencies, usually run short of accurately detecting such subjective information implied in the text. In this paper, we evaluate several popular classification algorithms, along with three filtering schemes. The filtering schemes progressively shrink the original dataset with respect to the contextual polarity and frequent terms of a document. We call this approach "hierarchical classification". The effects of the approach in different combination of classification algorithms and filtering schemes are discussed over three sets of controversial online news articles where binary and multi-class classifications are applied. Meanwhile we use two methods to test this hierarchical classification model, and also have a comparison of the two methods.
引用
收藏
页码:3411 / 3420
页数:10
相关论文
共 24 条
  • [1] Agrawal R., 2003, P 12 INT C WORLD WID, P529, DOI DOI 10.1145/775152.775227
  • [2] [Anonymous], 2005, P 14 ACM INT C INF
  • [3] [Anonymous], 1997, ICML
  • [4] [Anonymous], 2007, Hlt-naacl
  • [5] [Anonymous], 2004, Using WordNet to Measure Semantic Orientations of Adjectives
  • [6] [Anonymous], 2003, P 12 INT C WORLD WID, DOI DOI 10.1145/775152.775226
  • [7] Argamon S, 2009, LECT NOTES ARTIF INT, V5603, P218, DOI 10.1007/978-3-642-04235-5_19
  • [8] Cerini S., 2007, LANGUAGE RE IN PRESS
  • [9] Chaovalit P., 2005, SYST SCI 2005 HICSS
  • [10] ESULI A., 2005, P ACM INT C INFORM K, P617, DOI DOI 10.1145/1099554.1099713