Efficient algorithms for incremental Web log mining with dynamic thresholds

被引:6
|
作者
Ou, Jian-Chih [1 ]
Lee, Chang-Hung [1 ]
Chen, Ming-Syan [1 ]
机构
[1] Natl Taiwan Univ, Dept Elect Engn, Taipei 10764, Taiwan
来源
VLDB JOURNAL | 2008年 / 17卷 / 04期
关键词
Web mining path traversal pattern; dynamic support threshold;
D O I
10.1007/s00778-006-0043-9
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With the fast increase in Web activities, Web data mining has recently become an important research topic and is receiving a significant amount of interest from both academic and industrial environments. While existing methods are efficient for the mining of frequent path traversal patterns from the access information contained in a log file, these approaches are likely to over evaluate associations. Explicitly, most previous studies of mining path traversal patterns are based on the model of a uniform support threshold, where a single support threshold is used to determine frequent traversal patterns without taking into consideration such important factors as the length of a pattern, the positions of Web pages, and the importance of a particular pattern, etc. As a result, a low support threshold will lead to lots of uninteresting patterns derived whereas a high support threshold may cause some interesting patterns with lower supports to be ignored. In view of this, this paper broadens the horizon of frequent path traversal pattern mining by introducing a flexible model of mining Web traversal patterns with dynamic thresholds. Specifically, we study and apply the Markov chain model to provide the determination of support threshold of Web documents; and further, by properly employing some effective techniques devised for joining reference sequences, the proposed algorithm dynamic threshold miner (DTM) not only possesses the capability of mining with dynamic thresholds, but also significantly improves the execution efficiency as well as contributes to the incremental mining of Web traversal patterns. Performance of algorithm DTM and the extension of existing methods is comparatively analyzed with synthetic and real Web logs. It is shown that the option of algorithm DTM is very advantageous in reducing the number of unnecessary rules produced and leads to prominent performance improvement.
引用
收藏
页码:827 / 845
页数:19
相关论文
共 50 条
  • [1] Efficient algorithms for incremental Web log mining with dynamic thresholds
    Jian-Chih Ou
    Chang-Hung Lee
    Ming-Syan Chen
    The VLDB Journal, 2008, 17 : 827 - 845
  • [2] An efficient incremental algorithm for mining web navigation patterns with dynamic thresholds
    Ying, Jia-Ching
    Tseng, Vincent S.
    ICIC Express Letters, 2010, 4 (05): : 1625 - 1630
  • [3] Efficient web log mining for product development
    Woon, YK
    Ng, WK
    Li, X
    Lu, WF
    2003 INTERNATIONAL CONFERENCE ON CYBERWORLDS, PROCEEDINGS, 2003, : 294 - 301
  • [4] Evaluating web access log mining algorithms: A cognitive approach
    Woon, YK
    Ng, WK
    Lim, EP
    WISE 2002: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS ENGINEERING (WORKSHOPS), 2002, : 217 - 222
  • [5] An Efficient Incremental Mining Algorithm for Dynamic Databases
    Driff, Lydia Nahla
    Drias, Habiba
    MINING INTELLIGENCE AND KNOWLEDGE EXPLORATION (MIKE 2016), 2017, 10089 : 1 - 12
  • [6] An efficient incremental algorithm for mining web traversal patterns
    Yen, SJ
    Lee, YS
    Hsieh, MC
    ICEBE 2005: IEEE INTERNATIONAL CONFERENCE ON E-BUSINESS ENGINEERING, PROCEEDINGS, 2005, : 274 - 281
  • [7] Performance Evaluation of Frequent Pattern Mining Algorithms using Web Log Data for Web Usage Mining
    Gashaw, Yonas
    Liu, Fang
    2017 10TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI), 2017,
  • [8] Analysis of Efficient Classification Algorithms in Web Mining
    Chander, K. Prem
    Sharma, S. S. V. N.
    Nagaprasad, S.
    Anjaneyulu, M.
    Devi, V. Ajantha
    DATA ENGINEERING AND COMMUNICATION TECHNOLOGY, ICDECT-2K19, 2020, 1079 : 319 - 332
  • [9] Efficient mining indirect associations from web log data
    Yin, Ying
    Zhao, Yuhai
    Zhang, Bin
    Ning, Bo
    Journal of Computational Information Systems, 2007, 3 (03): : 1285 - 1292
  • [10] A Review Study of Server Log Formats for Efficient Web Mining
    Sharma, Pratibha
    Yadav, Surendra
    Bohra, Brahmdutt
    2015 International Conference on Green Computing and Internet of Things (ICGCIoT), 2015, : 1373 - 1377