FRS-SIFS: fuzzy rough set session identification and feature selection in web robot detection

被引:3
作者
Hamidzadeh, Javad [1 ]
Rahimi, Samaneh [1 ]
Zarif, Mohammad Ali [1 ]
机构
[1] Sadjad Univ, Fac Comp Engn & Informat Technol, Mashhad, Iran
关键词
Web robot detection; Rough set theory; Fuzzy rough set theory; Session identification; Data classification; FEATURE-EXTRACTION; CLASSIFICATION; DISCOVERY; BEHAVIOR;
D O I
10.1007/s13042-023-01905-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, web robots are a big part of web and useful in many cases. But, there are malicious web robots that need to be detected. Web robots often conceal their navigations by sending requests with incorrect or no information. It can be quite difficult to correctly and precisely classify this kind of incomplete data, including missing values. Previous studies have used IP addresses and user agent names to overcome this challenge, but these methods are unreliable. In order to solve this challenge, this paper has presented a robust algorithm named FRS-SIFS (Fuzzy Rough Set Session Identification and Feature Selection). FRS-SIFS first identifies user sessions using fuzzy rough set clustering based on string similarity measures. It then determines important features for recognizing web users' behavioral patterns using fuzzy rough set classification. FRS-SIFS labels the sessions using a novel precise heuristic method based on four phases. Moreover, two different feature selection methods are used which include fuzzy rough set quick reduction algorithm and a novel wrapper feature selection method. Finally, the multi-objective optimization algorithm NSGA-II (non-dominated sorting genetic algorithm II) is used to select the optimal set of features. The performance of the proposed method has been evaluated on a real-world dataset by the tenfold cross-validation method. The results of the experiments have been compared with state-of-the-art methods which show the superiority of the proposed method in terms of recall, precision, and F1 measures.
引用
收藏
页码:237 / 252
页数:16
相关论文
共 82 条
[61]  
Sardar TH, 2014, IMPACT E TECHNOLOGY
[62]  
Sisodia DS, 2015, American Journal of Systems and Software, V3, P31, DOI 10.12691/ajss-3-2-1
[63]  
Sisodia DS, 2015, Journal of Data Analysis and Information Processing, V03, P1, DOI [10.4236/jdaip.2015.31001, DOI 10.4236/JDAIP.2015.31001, DOI 10.4236/JDAIP.2015]
[64]   Performance Evaluation of the MapReduce-based Parallel Data Preprocessing Algorithm in Web Usage Mining with Robot Detection Approaches [J].
Srivastava, Mitali ;
Srivastava, Atul Kumar ;
Garg, Rakhi ;
Mishra, P. K. .
IETE TECHNICAL REVIEW, 2022, 39 (04) :865-879
[65]   Web robot detection: A probabilistic reasoning approach [J].
Stassopoulou, Athena ;
Dikaiakos, Marios D. .
COMPUTER NETWORKS, 2009, 53 (03) :265-278
[66]   Detection of malicious and non-malicious website visitors using unsupervised neural network learning [J].
Stevanovic, Dusan ;
Vlajic, Natalija ;
An, Aijun .
APPLIED SOFT COMPUTING, 2013, 13 (01) :698-708
[67]   Feature evaluation for web crawler detection with data mining techniques [J].
Stevanovic, Dusan ;
An, Aijun ;
Vlajic, Natalija .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (10) :8707-8717
[68]   Efficient on-the-fly Web bot detection [J].
Suchacka, Grazyna ;
Cabri, Alberto ;
Rovetta, Stefano ;
Masulli, Francesco .
KNOWLEDGE-BASED SYSTEMS, 2021, 223
[69]  
Suchacka G, 2015, 2015 IEEE 2ND INTERNATIONAL CONFERENCE ON CYBERNETICS (CYBCONF), P365, DOI 10.1109/CYBConf.2015.7175961
[70]   Discovery of Web robot sessions based on their navigational patterns [J].
Tan, PN ;
Kumar, V .
DATA MINING AND KNOWLEDGE DISCOVERY, 2002, 6 (01) :9-35