Web robot detection - Preprocessing web logfiles for robot detection

被引:28
作者
Bomhardt, C [1 ]
Gaul, W [1 ]
Schmidt-Thieme, L [1 ]
机构
[1] Univ Karlsruhe, Inst Entscheidungstheorie & Unternehmensforsch, Karlsruhe, Germany
来源
NEW DEVELOPMENTS IN CLASSIFICATION AND DATA ANALYSIS | 2005年
关键词
D O I
10.1007/3-540-27373-5_14
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Web usage mining has to face the problem that parts of the underlying logfiles are created by robots. While cooperative robots identify themselves and obey to the instructions of server owners not to access parts or all of the pages on the server, malignant robots may camouflage themselves and have to be detected by web robot scanning devices. We describe the methodology of robot detection and show that highly accurate tools can be applied to decide whether session data was generated by a robot or a human user.
引用
收藏
页码:113 / 124
页数:12
相关论文
共 16 条
[1]  
ALMEIDA V, 2001, P 2001 ACM SIGM C
[2]  
*ANACONDA PARTN II, AN FDN WEATH
[3]  
[Anonymous], 1994, STANDARD ROBOT EXCLU
[4]  
*APACHE, AP HTTP SERV LOG FIL
[5]  
ARLITT M, 2001, ACM T INTERNET TECHN
[6]  
BERENDT B, 2001, P WEB MIN WORKSH 1 S
[7]  
BOMHARDT C, 2002, ROBOT DETECTION TOOL
[8]  
*CAPTCHA PROJ, TELL HUM COMP AP
[9]  
CATLEDGE LD, 1995, CHARACTERIZING BROWS
[10]  
Gaul W, 2000, ST CLASS DAT ANAL, P429