FRS-SIFS: fuzzy rough set session identification and feature selection in web robot detection

被引:0
作者
Javad Hamidzadeh
Samaneh Rahimi
Mohammad Ali Zarif
机构
[1] Sadjad University,Faculty of Computer Engineering and Information Technology
来源
International Journal of Machine Learning and Cybernetics | 2024年 / 15卷
关键词
Web robot detection; Rough set theory; Fuzzy rough set theory; Session identification; Data classification;
D O I
暂无
中图分类号
学科分类号
摘要
Nowadays, web robots are a big part of web and useful in many cases. But, there are malicious web robots that need to be detected. Web robots often conceal their navigations by sending requests with incorrect or no information. It can be quite difficult to correctly and precisely classify this kind of incomplete data, including missing values. Previous studies have used IP addresses and user agent names to overcome this challenge, but these methods are unreliable. In order to solve this challenge, this paper has presented a robust algorithm named FRS-SIFS (Fuzzy Rough Set Session Identification and Feature Selection). FRS-SIFS first identifies user sessions using fuzzy rough set clustering based on string similarity measures. It then determines important features for recognizing web users’ behavioral patterns using fuzzy rough set classification. FRS-SIFS labels the sessions using a novel precise heuristic method based on four phases. Moreover, two different feature selection methods are used which include fuzzy rough set quick reduction algorithm and a novel wrapper feature selection method. Finally, the multi-objective optimization algorithm NSGA-II (non-dominated sorting genetic algorithm II) is used to select the optimal set of features. The performance of the proposed method has been evaluated on a real-world dataset by the tenfold cross-validation method. The results of the experiments have been compared with state-of-the-art methods which show the superiority of the proposed method in terms of recall, precision, and F1 measures.
引用
收藏
页码:237 / 252
页数:15
相关论文
共 158 条
[1]  
Baia Q(2014)Analysis and detection of bogus behavior in web crawler measurement Procedia Comput Sci 31 1084-1091
[2]  
Xiong G(2006)Automatic discovery of the sequential accesses from web log data files via a genetic algorithm Knowl Based Syst 19 180-186
[3]  
Zhao Y(2020)Bot recognition in a Web store: an approach based on unsupervised learning J Network Comput Appl 157 102577-278
[4]  
He L(2009)Web robot detection: a probabilistic reasoning approach Comput Netw 53 265-897
[5]  
Tug E(2005)An investigation of web crawler behavior: characterization and metrics Comput Commun 28 880-210
[6]  
S¸akirog˘lu M(2011)Web robot detection techniques: overview and limitations Data Min Knowl Disc 22 183-140
[7]  
Arslan A(2017)A soft computing approach for benign and malicious web robot detection Expert Syst Appl 87 129-15
[8]  
Rovetta S(2021)Performance evaluation of the map reduce-based parallel data preprocessing algorithm in web usage mining with robot detection approaches IETE Tech Rev 39 1-1016
[9]  
Suchacka G(2013)Analysing server log file using web log expert in web data mining Int J Sci Environ 2 1008-646
[10]  
Masulli F(2013)Blog or block: detecting blog bots through behavioral biometrics Comput Netw 57 634-3524