A soft computing approach for benign and malicious web robot detection

被引:23
作者
Zabihimayvan, Mandieh [1 ]
Sadeghi, Reza [1 ]
Rude, H. Nathan [1 ]
Doran, Derek [1 ]
机构
[1] Wright State Univ, Dept Comp Sci & Engn, Kno E Sis Res Ctr, Dayton, OH 45435 USA
基金
美国国家科学基金会;
关键词
Markov clustering algorithm; Web Robot Detection; Web crawler; Malicious web agents; Fuzzy Rough Set Theory; CLUSTERING APPROACH; CLASSIFICATION;
D O I
10.1016/j.eswa.2017.06.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The accurate detection of web robot sessions from a web server log is essential to take accurate traffic level measurements and to protect the performance and privacy of information on a Web server. Moreover, the irrecoverable risks of visits from malicious robots that intentionally try to evade web server intrusion detection systems, covering-up their visits with fabricated fields in their http request packets, cannot be ignored. To separate both types of robots from humans in practice, analysts turn to heuristic methods or state-of-the-art soft computing approaches that have only been tuned to the specification of a kind of web server. Noting that the landscape of web robot agents is ever changing, and that behavioral patterns and characteristics vary across different web servers, both options are lacking. To overcome this challenge, this paper presents SMART, a soft computing system that simultaneously detects benign and malicious types of robot agents from web server logs and can automatically adapt to the session characteristics of a web server. The results of experiments over some access log file servers, each servicing different domains of the web, demonstrate outperformance of the proposed method on state-of-the-art ones for benign and malicious robot detection. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:129 / 140
页数:12
相关论文
共 39 条
[1]  
Amigó E, 2013, SIGIR'13: THE PROCEEDINGS OF THE 36TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH & DEVELOPMENT IN INFORMATION RETRIEVAL, P643
[2]  
[Anonymous], 2013, WWW 13 P 22 INT C WO
[3]  
[Anonymous], 2001, Graph clustering by flow simulation
[4]  
[Anonymous], 2006, 100 STAT TESTS, DOI DOI 10.4135/9781849208499
[5]  
[Anonymous], 2012, PROC INF SCI IND APP
[6]   The Internet of Things: A survey [J].
Atzori, Luigi ;
Iera, Antonio ;
Morabito, Giacomo .
COMPUTER NETWORKS, 2010, 54 (15) :2787-2805
[7]   Web robot detection - Preprocessing web logfiles for robot detection [J].
Bomhardt, C ;
Gaul, W ;
Schmidt-Thieme, L .
NEW DEVELOPMENTS IN CLASSIFICATION AND DATA ANALYSIS, 2005, :113-124
[8]   An investigation of web crawler behavior: characterization and metrics [J].
Dikaiakos, MD ;
Stassopoulou, A ;
Papageorgiou, L .
COMPUTER COMMUNICATIONS, 2005, 28 (08) :880-897
[9]  
Doran Derek, 2009, Proceedings 21st International Conference on Software Engineering & Knowledge Engineering (SEKE 2009), P97
[10]  
Doran D, 2013, 2013 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), P1374