An Enhanced Pre-Processing Technique for Web Log Mining by Removing Web Robots

被引:0
|
作者
Nithya, P. [1 ]
Sumathi, P. [2 ]
机构
[1] Manonmaniam Sundaranar Univ, Tirunelveli, Tamil Nadu, India
[2] Chikkanna Govt Arts Coll, Tirupur, Tamil Nadu, India
来源
2012 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC) | 2012年
关键词
Preprocessing; Data Cleaning; Path Completion; Travel Path set; Content Path Set;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, internet becomes useful source of information in day-to-day life. It creates huge development of World Wide Web in its quantity of interchange and its size and difficulty of websites. Web Usage Mining (WUM) is one of the main applications of data mining, artificial intelligence and so on to the web data and forecast the user's visiting behaviors and obtains their interests by investigating the samples. Since WUM directly involves in large range of applications, such as, e-commerce, e-learning, Web analytics, information retrieval etc. Weblog data is one of the major sources which contain all the information regarding the users visited links, browsing patterns, time spent on a particular page or link and this information can be used in several applications like adaptive web sites, modified services, customer summary, pre-fetching, generate attractive web sites etc. There are several problems related with the existing web usage mining approaches. Existing web usage mining algorithms suffer from difficulty of practical applicability. So, a novel research is necessary for the accurate prediction of future performance of web users with rapid execution time. WUM consists of preprocessing, pattern discovery and pattern analysis. Log data is characteristically noisy and unclear. Hence, preprocessing is an essential process for effective mining process. In this paper, a novel pre-processing technique is proposed by removing local and global noise and web robots. Anonymous Microsoft Web Dataset and MSNBC.com Anonymous Web Dataset are used for estimating the proposed preprocessing technique.
引用
收藏
页码:662 / 665
页数:4
相关论文
共 50 条
  • [21] Web Log Data Analysis and Mining
    Grace, L. K. Joshila
    Maheswari, V.
    Nagamalai, Dhinaharan
    ADVANCED COMPUTING, PT III, 2011, 133 : 459 - 469
  • [22] Data preparation in web log mining
    Lu, Lina
    Yang, Yiling
    Guan, Xudong
    Wei, Hengyi
    Jisuanji Gongcheng/Computer Engineering, 2000, 26 (04): : 66 - 67
  • [23] Web log data mining analysis
    Lu Ansheng
    2012 INTERNATIONAL CONFERENCE ON INTELLIGENCE SCIENCE AND INFORMATION ENGINEERING, 2012, 20 : 213 - 215
  • [24] An effective system for mining web log
    Yang, ZL
    Wang, YT
    Kitsuregawa, M
    FRONTIERS OF WWW RESEARCH AND DEVELOPMENT - APWEB 2006, PROCEEDINGS, 2006, 3841 : 40 - 52
  • [25] Pre-Processing Methods of Data Mining
    Saleem, Asma
    Asif, Khadim Hussain
    Ali, Ahmad
    Awan, Shahid Mahmood
    AlGhamdi, Mohammed A.
    2014 IEEE/ACM 7TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2014, : 451 - 456
  • [26] IRPDP_HT2: a scalable data pre-processing method in web usage mining using Hadoop MapReduce
    Srivastava, Atul Kumar
    Srivastava, Mitali
    SOFT COMPUTING, 2023, 27 (12) : 7907 - 7923
  • [27] IRPDP_HT2: a scalable data pre-processing method in web usage mining using Hadoop MapReduce
    Atul Kumar Srivastava
    Mitali Srivastava
    Soft Computing, 2023, 27 : 7907 - 7923
  • [28] Integrating Web content clustering into Web log association rule mining
    Guo, J
    Keselj, V
    Gao, Q
    ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, 3501 : 182 - 193
  • [29] Web log mining - an instrument for market research in the world wide web
    Bensberg, F
    Weiss, T
    WIRTSCHAFTSINFORMATIK, 1999, 41 (05): : 426 - +
  • [30] An Improved Session Identification Approach in Web Log Mining for Web Personalization
    Sengottuvelan, P.
    Lokeshkumar, R.
    Gopalakrishnan, T.
    JOURNAL OF INTERNET TECHNOLOGY, 2017, 18 (04): : 723 - 730