A fuzzy Dempster-Shafer classifier for detecting Web spams

被引:8
作者
Chatterjee, Moitrayee [1 ]
Namin, Akbar Siami [2 ]
机构
[1] New Jersey City Univ, Comp Sci Dept, 2039 John F Kennedy Blvd, Jersey City, NJ 07305 USA
[2] Texas Tech Univ, Comp Sci Dept, 1012 Boston Ave, Lubbock, TX 79409 USA
基金
美国国家科学基金会;
关键词
Dempster-Shafer Theory; Dempster-Shafer Combination; Basic probability assignment; Mass function; Belief; Plausibility; Fuzzy reasoning; Classification; Web spam;
D O I
10.1016/j.jisa.2021.102793
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Web spam identification problem can be modeled as an instance of the conventional classification problem. Web spams aim at deceiving web crawlers by advertising certain Web pages through elevation of their page rankings superficially than their actual weights. Web spams are intended to produce fraudulent results of web search queries and degenerate the client's experience by directing users to fake Web pages. We present a fuzzy evidence-based methodology for identifying Web spams by which the spamicity of web hosts is formulated as a reasoning problem in the presence of uncertainty. However, any classification task intrinsically suffers from incomplete or vague evidence and ambiguity in the class assignment based on evidence. In this work, we combine fuzzy reasoning as the decision maker for selecting the most suitable evidence in a multi-source Dempster-Shafer (DS) based classification algorithm. The introduced approach has the benefit of providing more reliable solution to detect spams without any prior information. The evidence theory offers flexible support that takes into account the multi-dimensional nature of implementation decisions. The experimental results show that the fuzzy reasoning in combination with DS theory, reduces the conflicts among evidence leading to enhanced classification results. The aim of this paper is to describe the potential of fuzzy reasoning and the Dempster-Shafer Theory (DST) as a decision model for the web spams classification problem.
引用
收藏
页数:9
相关论文
共 39 条
[1]  
Abernethy Jacob., 2008, AIRWeb '08, P41, DOI DOI 10.1145/1451983.1451994
[2]  
Amitay E., 2003, P 14 ACM C HYPERTEXT, P38
[3]  
Androutsopoulos I., 2000, SIGIR Forum, V34, P160
[4]  
[Anonymous], 2006, P 15 INT C WORLD WID, DOI DOI 10.1145/1135777.1135794
[5]  
[Anonymous], 2012, ACM SIGKDD Explorations Newsletter, DOI [DOI 10.1145/2207243.2207252, 10.1145/2207243.2207252]
[6]  
ARTHUR PD, 1968, J ROY STAT SOC B, V0030, P00205
[7]   Detecting Web Spams Using Evidence Theory [J].
Chatterjee, Moitrayee ;
Namin, Akbar Siami .
2018 IEEE 42ND ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC 2018), VOL 2, 2018, :695-700
[8]   Feature selection for text classification with Naive Bayes [J].
Chen, Jingnian ;
Huang, Houkuan ;
Tian, Shengfeng ;
Qu, Youli .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) :5432-5435
[9]  
Convey Eric, 1996, BOSTON HERALD, V28
[10]  
Dennis Fetterly, 2007, ACM COMPUT REV