Analysis of Data Extraction and Data Cleaning in Web Usage Mining

被引:5
|
作者
Srivastava, Mitali [1 ]
Garg, Rakhi [2 ]
Mishra, P. K. [1 ]
机构
[1] Banaras Hindu Univ, Fac Sci, Dept Comp Sci, Varanasi, Uttar Pradesh, India
[2] Banaras Hindu Univ, Mahila Maha Vidyalaya, Comp Sci Sect, Varanasi, Uttar Pradesh, India
来源
ICARCSET'15: PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON ADVANCED RESEARCH IN COMPUTER SCIENCE ENGINEERING & TECHNOLOGY (ICARCSET - 2015) | 2015年
关键词
Web usage mining; Data preprocessing; Data extraction; Data cleaning;
D O I
10.1145/2743065.2743078
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Data preprocessing is considered as an important phase of Web usage mining due to unstructured, heterogeneous and noisy nature of log data. Complete and effective data preprocessing insures the efficiency and scalability of algorithms used in pattern discovery phase of Web usage mining. Data preprocessing generally includes the steps- Data fusion, Data cleaning, User identification, Session identification, Path completion etc. Data cleaning is the initial and important step in preprocessing to extract cleaned data for further processing. It is important to apply data extraction before data cleaning on raw log data in analysis of specific time-duration i.e. one day, one week or one month etc. In this paper we have mainly focused on data fusion, data extraction and data cleaning steps of preprocessing and proposed an algorithm for data extraction which extracts log data according to analysis of time duration. This algorithm also sorts log entries according to their date and time which will be further used in prediction of browsing sequence of user. After that we have applied data cleaning algorithm on extracted real Web server log. In data cleaning almost all irrelevant files, irrelevant HTTP methods and wrong HTTP status codes are considered and after experiment it is analyzed that raw log data reduces to almost 80% which shows the importance of initial phases of data preprocessing.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] A practical extension of web usage mining with intentional browsing data toward usage
    Tao, Yu-Hui
    Hong, Tzung-Pei
    Lin, Wen-Yang
    Chiu, Wen-Yuan
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (02) : 3937 - 3945
  • [22] Data mining in cleaning shop data
    Yang, Bowen
    Matsumura, Yoshiyuki
    RESEARCHES AND PROGRESSES OF MODERN TECHNOLOGY ON SILK, TEXTILE AND MECHANICALS II, 2007, : 69 - 69
  • [23] DataSpace: A data Web for the exploratory analysis and mining of data
    Grossman, R
    Mazzucco, M
    COMPUTING IN SCIENCE & ENGINEERING, 2002, 4 (04) : 44 - 51
  • [24] Personalised online sales using web usage data mining
    Zhang, Xuejun
    Edwards, John
    Harding, Jenny
    COMPUTERS IN INDUSTRY, 2007, 58 (8-9) : 772 - 782
  • [25] Association Rule Mining for Web Usage Data to Improve Websites
    Singh, Avadh Kishor
    Kumar, Ajeet
    Maurya, Ashish K.
    2014 INTERNATIONAL CONFERENCE ON ADVANCES IN ENGINEERING AND TECHNOLOGY RESEARCH (ICAETR), 2014,
  • [26] A framework for web usage mining on. anonymous logfile data
    Säuberlich, F
    Huber, KP
    EXPLORATORY DATA ANALYSIS IN EMPIRICAL RESEARCH, PROCEEDINGS, 2003, : 309 - 318
  • [27] Web Log Data Analysis and Mining
    Grace, L. K. Joshila
    Maheswari, V.
    Nagamalai, Dhinaharan
    ADVANCED COMPUTING, PT III, 2011, 133 : 459 - 469
  • [28] Web log data mining analysis
    Lu Ansheng
    2012 INTERNATIONAL CONFERENCE ON INTELLIGENCE SCIENCE AND INFORMATION ENGINEERING, 2012, 20 : 213 - 215
  • [29] A survey report on current research and development of data processing in web usage data mining
    Agrawal, Nandita
    Jawdekar, Anand
    International Journal of Database Theory and Application, 2016, 9 (05): : 101 - 110
  • [30] Web + Data Mining = Web Mining
    Kilian Stoffel
    HMD Praxis der Wirtschaftsinformatik, 2009, 46 (4) : 6 - 20