Sentiment analysis and spam detection in short informal text using learning classifier systems

被引:55
作者
Arif, Muhammad Hassan [1 ]
Li, Jianxin [1 ]
Iqbal, Muhammad [2 ]
Liu, Kaixu [3 ]
机构
[1] Beihang Univ BUAA, Sch Comp Sci & Engn, Adv Innovat Ctr Big Data & Brain Comp, Beijing 100191, Peoples R China
[2] Xtracta Ltd, Auckland 1061, New Zealand
[3] Univ Pavia, Dept Elect Comp & Biomed Engn, Pavia, Italy
关键词
Sentiment analysis; Spam detection; Learning classifier systems; High-dimensional; Sparseness; REAL;
D O I
10.1007/s00500-017-2729-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sentiment analysis of public views and spam detection from social media text messages are two challenging data analysis tasks due to short informal text. This paper investigates the performance of learning classifier systems (LCS), which are rule-based machine learning techniques, in sentiment analysis of twitter messages and movie reviews, and spam detection from SMS and email data sets. In this study, an existing LCS technique is extended by introducing a novel encoding scheme to represent classifier rules in order to handle the sparseness in feature vectors, which are generated using the term frequency inverse document frequency of word n-grams and sentiment lexicons. The obtained results show that the proposed encoding scheme smoothed the learning process and generated consistently good results in all experiments conducted in this study.
引用
收藏
页码:7281 / 7291
页数:11
相关论文
共 55 条
[1]  
Abdelwahab Omar, 2016, P SEMEVAL 2016, P169
[2]  
Alhessi Y., 2015, Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), P636
[3]  
[Anonymous], P 8 INT WORKSH SEM E
[4]  
[Anonymous], 2013, EMOTION SENTIMENT SO
[5]  
[Anonymous], 2016, P 10 INT WORKSH SEM
[6]  
[Anonymous], 2010, P NAACL HLT 2010 WOR, DOI DOI 10.5555/1860631.1860635
[7]  
[Anonymous], 2000, Learning Classifier Systems, DOI [DOI 10.1007/3-540-45027-011, 10.1007/3-540-45027-0_11, DOI 10.1007/3-540-45027-0_11]
[8]  
[Anonymous], 2016, P 10 INT WORKSHOP SE
[9]  
[Anonymous], 2011, P ACL
[10]  
Attardi G, 2016, P 10 INT WORKSH SEM, P225