Learning to detect and measure fake ecommerce websites in search-engine results

被引:18
作者
Carpineto, Claudio [1 ]
Romano, Giovanni [1 ]
机构
[1] Fdn Ugo Bordoni, Rome, Italy
来源
2017 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2017) | 2017年
关键词
online counterfeit goods; trustworthiness assessment of eshops; cybercrime measurement; website classification; spam detection in web search results;
D O I
10.1145/3106426.3106441
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When searching for a brand name in search engines, it is very likely to come across websites that sell fake brand's products. In this paper, we study how to tackle and measure this problem automatically. Our solution consists of a pipeline with two learning stages. We first detect the ecommerce websites (including shopbots) present in the list of search results and then discriminate between legitimate and fake ecommerce websites. We identify suitable learning features for each stage and show through a prototype system termed RI.SI.CO. that this approach is feasible, fast, and highly effective. Experimenting with one goods sector, we found that RI.SI.CO. achieved better classification accuracy than that of non-expert humans. We next show that the information extracted by our method can be used to generate sector-level 'counterfeiting charts' that allow us to analyze and compare the counterfeit risk associated with different brands in a same sector. We also show that the risk of coming across counterfeit websites is affected by the particular web search engine and type of search query used by shoppers. Our research offers new insights and some very practical and useful means for analyzing and measuring counterfeit ecommerce websites in search-engine results, thus enabling targeted anti-counterfeiting actions.
引用
收藏
页码:403 / 410
页数:8
相关论文
共 19 条
[1]   Webpage Menu Detection Based on DOM [J].
Alarte, Julian ;
Insa, David ;
Silva, Josep .
SOFSEM 2017: THEORY AND PRACTICE OF COMPUTER SCIENCE, 2017, 10139 :411-422
[2]  
[Anonymous], 2011, P 20 INT C WORLD WID
[3]  
[Anonymous], 2012, PROC INT C SECUR PRI
[4]  
Bannur S.N., 2011, Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, P1
[5]   Robust Detection of Semi-Structured Web Records Using a DOM Structure-Knowledge-Driven Model [J].
Bing, Lidong ;
Lam, Wai ;
Wong, Tak-Lam .
ACM TRANSACTIONS ON THE WEB, 2013, 7 (04)
[6]   Automatic Assessment of Website Compliance to the European Cookie Law with CooLCheck [J].
Carpineto, Claudio ;
Lo Re, Davide ;
Romano, Giovanni .
PROCEEDINGS OF THE 2016 ACM WORKSHOP ON PRIVACY IN THE ELECTRONIC SOCIETY (WPES'16), 2016, :135-138
[7]  
Chih wei Hsu., 2010, A practical guide to support vector classification
[8]   Knock It Off: Profiling the Online Storefronts of Counterfeit Merchandise [J].
Der, Matthew F. ;
Saul, Lawrence K. ;
Savage, Stefan ;
Voelker, Geoffrey M. .
PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, :1759-1768
[9]   Brand Attitudes and Search Engine Queries [J].
Dotson, Jeffrey P. ;
Fan, Ruixue Rachel ;
Feit, Elea McDonnell ;
Oldham, Jeffrey D. ;
Yeh, Yi-Hsin .
JOURNAL OF INTERACTIVE MARKETING, 2017, 37 :105-116
[10]  
Horch Andrea, 2015, P 2015 INT C WEB INF, P232