Detecting fake-review buyers using network structure: Direct evidence from Amazon

被引:18
作者
He, Sherry [1 ]
Hollenbeck, Brett [1 ]
Overgoor, Gijs [2 ]
Proserpio, Davide [3 ]
Tosyali, Ali [2 ]
机构
[1] Univ Calif Los Angeles, Anderson Sch Management, Los Angeles, CA 90095 USA
[2] Rochester Inst Technol, Saunders Coll Business, Rochester, NY 14623 USA
[3] Univ Southern Calif, Marshall Sch Business, Los Angeles, CA 90089 USA
关键词
online reviews; networks; machine learning; text analysis; REPUTATION;
D O I
10.1073/pnas.2211932119
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Online reviews significantly impact consumers' decision-making process and firms' economic outcomes and are widely seen as crucial to the success of online markets. Firms, therefore, have a strong incentive to manipulate ratings using fake reviews. This presents a problem that academic researchers have tried to solve for over two decades and on which platforms expend a large amount of resources. Nevertheless, the prevalence of fake reviews is arguably higher than ever. To combat this, we collect a dataset of reviews for thousands of Amazon products and develop a general and highly accurate method for detecting fake reviews. A unique difference between previous datasets and ours is that we directly observe which sellers buy fake reviews. Thus, while prior research has trained models using laboratory-generated reviews or proxies for fake reviews, we are able to train a model using actual fake reviews. We show that products that buy fake reviews are highly clustered in the product reviewer network. Therefore, features constructed from this network are highly predictive of which products buy fake reviews. We show that our network-based approach is also successful at detecting fake review buyers even without ground truth data, as unsupervised clustering methods can accurately identify fake review buyers by identifying clusters of products that are closely connected in the network. While text or metadata can be manipulated to evade detection, network-based features are more costly to manipulate because these features result directly from the inherent limitations of buying reviews from online review marketplaces, making our detection approach more robust to manipulation.
引用
收藏
页数:5
相关论文
共 21 条
[1]   Stimulating Online Reviews by Combining Financial Incentives and Social Norms [J].
Burtch, Gordon ;
Hong, Yili ;
Bapna, Ravi ;
Griskevicius, Vladas .
MANAGEMENT SCIENCE, 2018, 64 (05) :2065-2082
[2]  
Crawford M., 2015, Journal of Big Data, V2, P1, DOI [10.1186/s40537-015-0029-9, DOI 10.1186/S40537-015-0029-9]
[3]   Online Review Characteristics and Trust: A Cross-Country Examination [J].
Dong, Beibei ;
Li, Mei ;
Sivakumar, K. .
DECISION SCIENCES, 2019, 50 (03) :537-566
[4]   Model-based clustering, discriminant analysis, and density estimation [J].
Fraley, C ;
Raftery, AE .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (458) :611-631
[5]   Distance metrics for high dimensional nearest neighborhood recovery: Compression and normalization [J].
France, Stephen L. ;
Carroll, J. Douglas ;
Xiong, Hui .
INFORMATION SCIENCES, 2012, 184 (01) :92-110
[6]   Inefficiencies in Digital Advertising Markets [J].
Gordon, Brett R. ;
Jerath, Kinshuk ;
Katona, Zsolt ;
Narayanan, Sridhar ;
Shin, Jiwoong ;
Wilbur, Kenneth C. .
JOURNAL OF MARKETING, 2021, 85 (01) :7-25
[7]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[8]   The Market for Fake Reviews [J].
He, Sherry ;
Hollenbeck, Brett ;
Proserpio, Davide .
MARKETING SCIENCE, 2022, 41 (05) :896-921
[9]   Detection of review spam: A survey [J].
Heydari, Atefeh ;
Tavakoli, Mohammad Ali ;
Salim, Naomie ;
Heydari, Zahra .
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (07) :3634-3642
[10]   Detecting Review Manipulation on Online Platforms with Hierarchical Supervised Learning [J].
Kumar, Naveen ;
Venugopal, Deepak ;
Qiu, Liangfei ;
Kumar, Subodha .
JOURNAL OF MANAGEMENT INFORMATION SYSTEMS, 2018, 35 (01) :350-380