FRS-SIFS: fuzzy rough set session identification and feature selection in web robot detection

被引:1
作者
Hamidzadeh, Javad [1 ]
Rahimi, Samaneh [1 ]
Zarif, Mohammad Ali [1 ]
机构
[1] Sadjad Univ, Fac Comp Engn & Informat Technol, Mashhad, Iran
关键词
Web robot detection; Rough set theory; Fuzzy rough set theory; Session identification; Data classification; FEATURE-EXTRACTION; CLASSIFICATION; DISCOVERY; BEHAVIOR;
D O I
10.1007/s13042-023-01905-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, web robots are a big part of web and useful in many cases. But, there are malicious web robots that need to be detected. Web robots often conceal their navigations by sending requests with incorrect or no information. It can be quite difficult to correctly and precisely classify this kind of incomplete data, including missing values. Previous studies have used IP addresses and user agent names to overcome this challenge, but these methods are unreliable. In order to solve this challenge, this paper has presented a robust algorithm named FRS-SIFS (Fuzzy Rough Set Session Identification and Feature Selection). FRS-SIFS first identifies user sessions using fuzzy rough set clustering based on string similarity measures. It then determines important features for recognizing web users' behavioral patterns using fuzzy rough set classification. FRS-SIFS labels the sessions using a novel precise heuristic method based on four phases. Moreover, two different feature selection methods are used which include fuzzy rough set quick reduction algorithm and a novel wrapper feature selection method. Finally, the multi-objective optimization algorithm NSGA-II (non-dominated sorting genetic algorithm II) is used to select the optimal set of features. The performance of the proposed method has been evaluated on a real-world dataset by the tenfold cross-validation method. The results of the experiments have been compared with state-of-the-art methods which show the superiority of the proposed method in terms of recall, precision, and F1 measures.
引用
收藏
页码:237 / 252
页数:16
相关论文
共 82 条
[1]  
Algiriyage N, 2013, IND INFORM SYSTEMS I
[2]  
[Anonymous], 2017, GECK
[3]  
[Anonymous], 2011, Int J Comput Appl
[4]  
[Anonymous], 2012, Int J Adv Sci Technol
[5]   Analysis and Detection of Bogus Behavior in Web Crawler Measurement [J].
Bai, Quan ;
Xiong, Gang ;
Zhao, Yong ;
He, Longtao .
2ND INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT, ITQM 2014, 2014, 31 :1084-1091
[6]   On fuzzy-rough sets approach to feature selection [J].
Bhatt, RB ;
Gopal, M .
PATTERN RECOGNITION LETTERS, 2005, 26 (07) :965-975
[7]   A Quantum-Inspired Classifier for Early Web Bot Detection [J].
Cabri, Alberto ;
Masulli, Francesco ;
Rovetta, Stefano ;
Suchacka, Grazyna .
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2022, 17 :1684-1697
[8]  
Chandrama W., 2014, INT J COMPUT SCI INF, V5, P3521
[9]   Driver identification based on hidden feature extraction by using adaptive nonnegativity-constrained autoencoder [J].
Chen, Jie ;
Wu, ZhongCheng ;
Zhang, Jun .
APPLIED SOFT COMPUTING, 2019, 74 :1-9
[10]  
Chitraa V., 2014, INT J COMPUT SCI APP, V14, P81, DOI DOI 10.5121/IJCSA.2014.4209