Knowledge Discovery from Log Data Analysis in a Multi-source Search System based on Deep Cleaning

被引:2
作者
Lebib, Fatma [1 ,2 ]
Mellah, Hakima [2 ]
Meziane, Abdelkrim [2 ]
机构
[1] Univ Sci & Technol Houari Boumediene, USTHB, Algiers, Algeria
[2] CERIST, Res Ctr Sci & Tech Informat, Algiers, Algeria
来源
WEBIST: PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES | 2019年
关键词
Log Files Analysis; Web Usage Mining; Multi-source Search System; Knowledge Extraction; Information Source; User Profile;
D O I
10.5220/0008121102570264
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In a multi-source search system, understanding users' interests and behaviour is essential to improve the search and adapt the results according to each user profile. The interesting information characterizing the users can be hidden in large log files, whereas it must be discovered, extracted and analyzed to build an accurate user profile. This paper presents an approach which analyzes the log data of a multi-source search system using the web usage mining techniques. The aim is to capture, model and analyze the behavioural patterns and profiles of users interacting with this system. The proposed approach consists of two major steps, the first step "pre-processing" eliminates the unwanted data from log files based on predefined cleaning rules, and the second step "processing" extracts useful data on user's previous queries. In addition to the conventional cleaning process that removes irrelevant data from the log file, such as access of multimedia files, error codes and accesses of Web robots, deep cleaning is proposed, which analyzes the queries structure of different sources to further eliminate unwanted data. This allows to accelerate the processing phase. The generated data can be used for personalizing user-system interaction, information filtering and recommending appropriate sources for the needs of each user.
引用
收藏
页码:257 / 264
页数:8
相关论文
共 22 条
[1]   Towards more Trustable Log Files for Digital Forensics by Means of "Trusted Computing" [J].
Boeck, Benjamin ;
Huemer, David ;
Tjoa, A. Min .
2010 24TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS (AINA), 2010, :1020-1027
[2]  
Carman Mark J, 2010, P CIKM, P1849
[3]  
Cooley R., 1999, Knowledge and Information Systems, V1, P5
[4]   Web mining: Information and pattern discovery on the World Wide Web [J].
Cooley, R ;
Mobasher, B ;
Srivastava, J .
NINTH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 1997, :558-567
[5]  
Dwivedi S., 2017, INT J ENG SCI RES TE, V6, P547
[6]   Mining interesting knowledge from weblogs: a survey [J].
Facca, FM ;
Lanzi, PL .
DATA & KNOWLEDGE ENGINEERING, 2005, 53 (03) :225-241
[7]   Analysis of Users' Behavior in Structured e-Commerce Websites [J].
Hernandez, Sergio ;
Alvarez, Pedro ;
Fabra, Javier ;
Ezpeleta, Joaquin .
IEEE ACCESS, 2017, 5 :11941-11958
[8]  
Huang LC, 2008, ADVANCES IN MANAGEMENT OF TECHNOLOGY, PT 1, P675
[9]  
Astrain JJ, 2010, SEMAPRO 2010: THE FOURTH INTERNATIONAL CONFERENCE ON ADVANCES IN SEMANTIC PROCESSING, P49
[10]  
Jaya Kumar V., 2013, INT J SCI ENV TECHNO, V2, P1008