Detecting Singleton Review Spammers Using Semantic Similarity

被引:50
|
作者
Sandulescu, Vlad [1 ,3 ]
Ester, Martin [2 ]
机构
[1] Adform, Copenhagen, Denmark
[2] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC, Canada
[3] Trustpilot, Copenhagen, Denmark
来源
WWW'15 COMPANION: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB | 2015年
关键词
opinion spam; fake review detection; semantic similarity; aspect-based opinion mining; latent dirichlet allocation;
D O I
10.1145/2740908.2742570
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Online reviews have increasingly become a very important resource for consumers when making purchases. Though it is becoming more and more difficult for people to make well-informed buying decisions without being deceived by fake reviews. Prior works on the opinion spam problem mostly considered classifying fake reviews using behavioral user patterns. They focused on prolific users who write more than a couple of reviews, discarding one-time reviewers. The number of singleton reviewers however is expected to be high for many review websites. While behavioral patterns are effective when dealing with elite users, for one-time reviewers, the review text needs to be exploited. In this paper we tackle the problem of detecting fake reviews written by the same person using multiple names, posting each review under a different name. We propose two methods to detect similar reviews and show the results generally outperform the vectorial similarity measures used in prior works. The first method extends the semantic similarity between words to the reviews level. The second method is based on topic modeling and exploits the similarity of the reviews topic distributions using two models: bag-of-words and bag-of-opinion phrases. The experiments were conducted on reviews from three different datasets: Yelp (57K reviews), Trustpilot (9K reviews) and Ott dataset (800 reviews).
引用
收藏
页码:971 / 976
页数:6
相关论文
共 50 条
  • [21] Semantic similarity-based PageRank using wordnet
    Poomagal, S.
    Hamsapriya, T.
    Visalakshi, P.
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2013, 46 (02) : 101 - 112
  • [22] Using Information Content to Evaluate Semantic Similarity on HowNet
    You Bin
    Liu Xiao-ran
    Li Ning
    Yan Yue-song
    PROCEEDINGS OF THE 2012 EIGHTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS 2012), 2012, : 142 - 145
  • [23] Short Text Similarity Calculation Using Semantic Information
    Pu, Haoyu
    Fei, Gaolei
    Zhao, Hailin
    Hu, Guangmin
    Jiao, Chengbo
    Xu, Zhoujun
    2017 3RD INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS (BIGCOM), 2017, : 144 - 150
  • [24] Chinese SNS Blog Classification Using Semantic Similarity
    Shi Chenye
    Li Jianhua
    Chen Jieyuan
    Chen Xiuzhen
    2013 FIFTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL ASPECTS OF SOCIAL NETWORKS (CASON), 2013, : 1 - 6
  • [25] Web Search Personalization Using Semantic Similarity Measure
    Sharma, Sunny
    Rana, Vijay
    PROCEEDINGS OF RECENT INNOVATIONS IN COMPUTING, ICRIC 2019, 2020, 597 : 273 - 288
  • [26] Semantic similarity assessment of words using weighted WordNet
    Mostafa Ghazizadeh Ahsaee
    Mahmoud Naghibzadeh
    S. Ehsan Yasrebi Naeini
    International Journal of Machine Learning and Cybernetics, 2014, 5 : 479 - 490
  • [27] Interspecies gene function prediction using semantic similarity
    Yu, Guoxian
    Luo, Wei
    Fu, Guangyuan
    Wang, Jun
    BMC SYSTEMS BIOLOGY, 2016, 10
  • [28] Sentiment analysis using semantic similarity and Hadoop MapReduce
    Madani, Youness
    Erritali, Mohammed
    Bengourram, Jamaa
    KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 59 (02) : 413 - 436
  • [29] A framework for automatic causality extraction using semantic similarity
    Kim, Sanghee
    Bracewell, Rob H.
    Wallace, Ken M.
    27TH COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, VOL 2, PTS A AND B 2007: PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2008, : 831 - 840
  • [30] Sentiment analysis using semantic similarity and Hadoop MapReduce
    Youness Madani
    Mohammed Erritali
    Jamaa Bengourram
    Knowledge and Information Systems, 2019, 59 : 413 - 436