The Effects of Features Selection Methods on Spam Review Detection Performance

被引:11
作者
Etaiwi, Wael [1 ]
Awajan, Arafat [1 ]
机构
[1] Princess Sumaya Univ Technol, Amman, Jordan
来源
2017 INTERNATIONAL CONFERENCE ON NEW TRENDS IN COMPUTING SCIENCES (ICTCS) | 2017年
关键词
spam reviews; feature selection; machine learning; spam detection;
D O I
10.1109/ICTCS.2017.50
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Online reviews become a valuable source of information that indicates the overall opinion about products and services, which may affect decision-making processes such as purchase a product or service. Fake reviews are considered as spam reviews, which may have a great impact in the online marketplace behavior. Extracting useful features from review's text using Natural Language Processing (NLP) is not a straightforward step, in addition, it affects the overall performance and results. Many types of features could be used for conducting this task such as Bag-of-Words, linguistic features, words counts and n-gram feature. In this paper, we will investigate the effects of using two different feature selection methods on the spam reviews detection: Bag-of-Words and words counts. Different machine learning algorithms were applied such as Support Victor Machine, Decision Tree, Naive Bayes and Random Forest. Experiments were conducted on a labeled balanced dataset of Hotels reviews. The efficiency will be evaluated according to many evaluation measures such as: precision, recall and accuracy.
引用
收藏
页码:116 / 120
页数:5
相关论文
共 17 条
[1]  
[Anonymous], ECML PKDD DISCOVERY
[2]   SENSITIVITY ANALYSIS IN BAYESIAN CLASSIFICATION MODELS - MULTIPLICATIVE DEVIATIONS [J].
BENBASSAT, M ;
KLOVE, KL ;
WEIL, MH .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1980, 2 (03) :261-266
[3]  
Dixit Snehal, 2013, INT J COMPUT COMMUN, V4
[4]  
Jindal N., 2008, Proceedings of the 2008 International Conference on Web Search and Data Mining: ACM, DOI DOI 10.1145/1341531.1341560
[5]  
Jindal N., 2008, PROCEEDINGS
[6]  
Jindal N, 2007, P 16 INT C WORLD WID, P1189, DOI [DOI 10.1145/1242572.1242759, 10.1145/1242572.1242759]
[7]  
Jindal N, 2007, P 16 INT C WORLD WID
[8]  
Lee Sang Min, 2010, COMPL INT SOFTW INT
[9]  
Lim E.-.P., 2010, P 19 ACM INT C INF K
[10]  
Lupher Antonio, FEATURE SELECTION CL