The Impact of applying Different Preprocessing Steps on Review Spam Detection

被引：36

作者：

Etaiwi, Wael ^{[1
]}

Naymat, Ghazi ^{[1
]}

机构：

[1] Princess Sumaya Univ Technol, Amman, Jordan

来源：

8TH INTERNATIONAL CONFERENCE ON EMERGING UBIQUITOUS SYSTEMS AND PERVASIVE NETWORKS (EUSPN 2017) / 7TH INTERNATIONAL CONFERENCE ON CURRENT AND FUTURE TRENDS OF INFORMATION AND COMMUNICATION TECHNOLOGIES IN HEALTHCARE (ICTH-2017) / AFFILIATED WORKSHOPS | 2017年 / 113卷

关键词：

spam reviews; preprocessing; Bag-of-Words; feature selection; machine learning;

D O I：

10.1016/j.procs.2017.08.368

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Online reviews become a valuable source of information that indicate the overall opinion about products and services, which affect customers decision to purchase a product or service. Since not all online reviews and comments are truthful, it is important to detect fake and poison reviews. Many machine learning techniques could be applied to detect spam reviews by extracting a useful features from reviews text using Natural Language Processing (NLP). Many types of features could be used in this manor such as linguistic features, Word Count, n-gram feature sets and number of pronouns. In order to extract such features, many types of preprocessing steps could be performed before applying the classification method, this steps may include POS tagging, n-gram term frequencies, stemming, stop word and punctuation marks filtering, etc. this preprocessing steps may affect the overall accuracy of the review spam detection task. In this research, we will investigate the effects of preprocessing steps on the accuracy of reviews spam detection. Different machine learning algorithms will be applied such as Support Victor Machine (SVM) and Naive Bayes (NB), and a labeled dataset of Hotels reviews will be analyze and process. The efficiency will be evaluated according to many evaluation measures such as: precision, recall and accuracy. (c) 2017 The Authors. Published by Elsevier B.V.

引用

页码：273 / 279

页数：7

共 13 条

[1]

Ahsan M.N. Istiaq, 2016, INF TECHN EL MOB COM

[2] SENSITIVITY ANALYSIS IN BAYESIAN CLASSIFICATION MODELS - MULTIPLICATIVE DEVIATIONS [J].

BENBASSAT, M ;

KLOVE, KL ;

WEIL, MH .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1980, 2 (03) :261-266

[3]

Chapelle O., 2006, SEMISUPERVISED LEARN

[4]

Crawford Michael, 2016, 20 9 INT FLAIRS C

[5]

Dixit Snehal, 2013, INT J COMPUT COMMUN, V4

[6]

Jindal N., 2008, PROCEEDINGS

[7]

Jindal N, 2007, P 16 INT C WORLD WID

[8]

KARAMI A., 2015, iConference 2015 Proceedings

[9]

Klatt B., 2014, P WSRE 14

[10]

Lim E.-.P., 2010, P 19 ACM INT C INF K

← 1 2 →