A Framework for Mining High Utility Web Access Sequences

被引:30
作者
Ahmed, Chowdhury Farhan [1 ]
Tanbeer, Syed Khairuzzaman [1 ]
Jeong, Byeong-Soo [1 ]
机构
[1] Kyung Hee Univ, Dept Comp Engn, Database Lab, Youngin Si 446701, Kyunggi Do, South Korea
关键词
Data mining; High utility patterns; Incremental mining; Interactive mining; Web access sequences; Web mining; ITEMSET UTILITIES; PATTERNS;
D O I
10.4103/0256-4602.74506
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Mining web access sequences (WASs) can discover very useful knowledge from web logs with broad applications. By considering non-binary occurrences of web pages as internal utilities in WASs, e.g., time spent by each user in a web page, more realistic information can be extracted. However, the existing utility-based approach has many limitations such as considering only forward references of web access sequences, not applicable for incremental mining, suffers in the level-wise candidate generation-and-test methodology, needs several database scans and does not show how to mine web access sequences with different impacts/significances for different web pages. In this paper, we propose a novel framework to solve these problems. Moreover, we propose two new tree structures, called utility-based WAS tree (UWAS-tree) and incremental UWAS-tree (IUWAS-tree) for mining WASs in static and incremental databases, respectively. Our approach can handle both forward and backward references, static and incremental data, avoids the level-wise candidate generation-and-test methodology, does not scan databases several times, and considers both internal and external utilities of a web page. The IUWAS-tree is also applicable for interactive mining. Extensive performance analyses show that our approach is very efficient for both static and incremental mining of high utility WASs.
引用
收藏
页码:3 / 16
页数:14
相关论文
共 21 条
  • [1] AGRAWAL R, 1995, PROC INT CONF DATA, P3, DOI 10.1109/ICDE.1995.380415
  • [2] Agrawal R., 1994, P 20 INT C VER LARG, P487, DOI DOI 10.5555/645920.672836
  • [3] Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases
    Ahmed, Chowdhury Farhan
    Tanbeer, Syed Khairuzzaman
    Jeong, Byeong-Soo
    Lee, Young-Koo
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (12) : 1708 - 1721
  • [4] Ahmed CF, 2009, LECT NOTES ARTIF INT, V5476, P749, DOI 10.1007/978-3-642-01307-2_76
  • [5] [Anonymous], IBM QUEST SYNTH DAT
  • [6] Efficient data mining for path traversal patterns
    Chen, MS
    Park, JS
    Yu, PS
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1998, 10 (02) : 209 - 221
  • [7] Mining web log sequential patterns with position coded pre-order linked WAP-tree
    Ezeife, CI
    Lu, Y
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2005, 10 (01) : 5 - 38
  • [8] Fast algorithms for frequent itemset mining using FP-trees
    Grahne, G
    Zhu, JF
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (10) : 1347 - 1362
  • [9] Frequent pattern mining: current status and future directions
    Han, Jiawei
    Cheng, Hong
    Xin, Dong
    Yan, Xifeng
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2007, 15 (01) : 55 - 86
  • [10] Mining frequent patterns without candidate generation: A frequent-pattern tree approach
    Han, JW
    Pei, J
    Yin, YW
    Mao, RY
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2004, 8 (01) : 53 - 87