Analysis of Data Extraction and Data Cleaning in Web Usage Mining

被引:5
|
作者
Srivastava, Mitali [1 ]
Garg, Rakhi [2 ]
Mishra, P. K. [1 ]
机构
[1] Banaras Hindu Univ, Fac Sci, Dept Comp Sci, Varanasi, Uttar Pradesh, India
[2] Banaras Hindu Univ, Mahila Maha Vidyalaya, Comp Sci Sect, Varanasi, Uttar Pradesh, India
来源
ICARCSET'15: PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON ADVANCED RESEARCH IN COMPUTER SCIENCE ENGINEERING & TECHNOLOGY (ICARCSET - 2015) | 2015年
关键词
Web usage mining; Data preprocessing; Data extraction; Data cleaning;
D O I
10.1145/2743065.2743078
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data preprocessing is considered as an important phase of Web usage mining due to unstructured, heterogeneous and noisy nature of log data. Complete and effective data preprocessing insures the efficiency and scalability of algorithms used in pattern discovery phase of Web usage mining. Data preprocessing generally includes the steps- Data fusion, Data cleaning, User identification, Session identification, Path completion etc. Data cleaning is the initial and important step in preprocessing to extract cleaned data for further processing. It is important to apply data extraction before data cleaning on raw log data in analysis of specific time-duration i.e. one day, one week or one month etc. In this paper we have mainly focused on data fusion, data extraction and data cleaning steps of preprocessing and proposed an algorithm for data extraction which extracts log data according to analysis of time duration. This algorithm also sorts log entries according to their date and time which will be further used in prediction of browsing sequence of user. After that we have applied data cleaning algorithm on extracted real Web server log. In data cleaning almost all irrelevant files, irrelevant HTTP methods and wrong HTTP status codes are considered and after experiment it is analyzed that raw log data reduces to almost 80% which shows the importance of initial phases of data preprocessing.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] Knowledge Extraction Using Web Usage Mining
    Waqas, Muhammad
    Iram, Maria
    Shahzad, Sara
    Arshad, Sidra
    Nawaz, Tahir
    EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2018, 4 (16) : 1 - 5
  • [32] Analysis of web usage mining
    Hui Yu
    Zhongmin Lu
    PROCEEDINGS OF THE 2006 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE AND ENGINEERING, 2006, : 1291 - 1296
  • [33] Performance Evaluation of Frequent Pattern Mining Algorithms using Web Log Data for Web Usage Mining
    Gashaw, Yonas
    Liu, Fang
    2017 10TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI), 2017,
  • [34] Data cleaning of medical data for knowledge mining
    Hongxing, P. (phxlee@zzu.edu.cn), 1600, Academy Publisher, P.O.Box 40,, OULU, 90571, Finland (08):
  • [35] An Inclusive Survey on Data Preprocessing Methods Used in Web Usage Mining
    Bakariya, Brijesh
    Mohbey, Krishna K.
    Thakur, G. S.
    PROCEEDINGS OF SEVENTH INTERNATIONAL CONFERENCE ON BIO-INSPIRED COMPUTING: THEORIES AND APPLICATIONS (BIC-TA 2012), VOL 2, 2013, 202 : 407 - 416
  • [36] Construction and analysis of evolving data summaries: an application on Web usage data
    da Silva, Alzennyr
    Lechevallier, Yves
    Rossi, Fabrice
    de Carvalho, Francisco
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, 2007, : 377 - +
  • [37] Review on Modern Data Preprocessing Techniques in Web Usage Mining (WUM)
    Sukumar, P.
    Robert, L.
    Yuvaraj, S.
    2016 INTERNATIONAL CONFERENCE ON COMPUTATION SYSTEM AND INFORMATION TECHNOLOGY FOR SUSTAINABLE SOLUTIONS (CSITSS), 2016, : 64 - 69
  • [38] New Frontier of Informetric and Webometric Research: Mining Web Usage Data
    Vaughan, Liwen
    COLLNET JOURNAL OF SCIENTOMETRICS AND INFORMATION MANAGEMENT, 2008, 2 (02) : 29 - 35
  • [39] Data Preprocessing for Web Data Mining
    Zhang, Wei
    Chen, Tinggui
    ADVANCES IN ELECTRONIC COMMERCE, WEB APPLICATION AND COMMUNICATION, VOL 2, 2012, 149 : 303 - +
  • [40] Web data mining
    Wibonele, KJ
    Zhang, YQ
    DATA MINING AND KNOWLEDGE DISCOVERY: THEORY, TOOLS AND TECHNOLOGY IV, 2002, 4730 : 241 - 244