FRS-SIFS: fuzzy rough set session identification and feature selection in web robot detection

被引:3
作者
Hamidzadeh, Javad [1 ]
Rahimi, Samaneh [1 ]
Zarif, Mohammad Ali [1 ]
机构
[1] Sadjad Univ, Fac Comp Engn & Informat Technol, Mashhad, Iran
关键词
Web robot detection; Rough set theory; Fuzzy rough set theory; Session identification; Data classification; FEATURE-EXTRACTION; CLASSIFICATION; DISCOVERY; BEHAVIOR;
D O I
10.1007/s13042-023-01905-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, web robots are a big part of web and useful in many cases. But, there are malicious web robots that need to be detected. Web robots often conceal their navigations by sending requests with incorrect or no information. It can be quite difficult to correctly and precisely classify this kind of incomplete data, including missing values. Previous studies have used IP addresses and user agent names to overcome this challenge, but these methods are unreliable. In order to solve this challenge, this paper has presented a robust algorithm named FRS-SIFS (Fuzzy Rough Set Session Identification and Feature Selection). FRS-SIFS first identifies user sessions using fuzzy rough set clustering based on string similarity measures. It then determines important features for recognizing web users' behavioral patterns using fuzzy rough set classification. FRS-SIFS labels the sessions using a novel precise heuristic method based on four phases. Moreover, two different feature selection methods are used which include fuzzy rough set quick reduction algorithm and a novel wrapper feature selection method. Finally, the multi-objective optimization algorithm NSGA-II (non-dominated sorting genetic algorithm II) is used to select the optimal set of features. The performance of the proposed method has been evaluated on a real-world dataset by the tenfold cross-validation method. The results of the experiments have been compared with state-of-the-art methods which show the superiority of the proposed method in terms of recall, precision, and F1 measures.
引用
收藏
页码:237 / 252
页数:16
相关论文
共 82 条
[31]  
Hayati P, 2010, ADV INFORM NETWORKIN
[32]   Enhancing the security of patients' portals and websites by detecting malicious web crawlers using machine learning techniques [J].
Hosseini, Nafiseh ;
Fakhar, Fatemeh ;
Kiani, Behzad ;
Eslami, Saeid .
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2019, 132
[33]   Detection of AdvancedWeb Bots by CombiningWeb Logs with Mouse Behavioural Biometrics [J].
Iliou, Christos ;
Kostoulas, Theodoros ;
Tsikrika, Theodora ;
Katos, Vasilis ;
Vrochidis, Stefanos ;
Kompatsiaris, Ioannis .
DIGITAL THREATS: RESEARCH AND PRACTICE, 2021, 2 (03)
[34]  
Jagat RR, 2022, LECT NOTES DATA ENG, V116
[35]  
Jagat RR, 2023, LECT NOTES NETWORKS, V521
[36]  
Jaro-Winkler Distance, 2015, WIKIPEDIA FREE E
[37]  
Jayakumar V., 2013, INT J SCI ENV, V2, P1008
[38]   An improved feature extraction method using texture analysis with LBP for bearing fault diagnosis [J].
Kaplan, Kaplan ;
Kaya, Yilmaz ;
Kuncan, Melih ;
Minaz, Mehmet Recep ;
Ertunc, H. Metin .
APPLIED SOFT COMPUTING, 2020, 87
[39]   Comparisons of machine learning techniques for detecting malicious webpages [J].
Kazemian, H. B. ;
Ahmed, S. .
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (03) :1166-1177
[40]   Keystroke dynamics-based user authentication using freely typed text based on user-adaptive feature extraction and novelty detection [J].
Kim, Junhong ;
Kim, Haedong ;
Kang, Pilsung .
APPLIED SOFT COMPUTING, 2018, 62 :1077-1087