A Framework for Mining High Utility Web Access Sequences

被引:30
作者
Ahmed, Chowdhury Farhan [1 ]
Tanbeer, Syed Khairuzzaman [1 ]
Jeong, Byeong-Soo [1 ]
机构
[1] Kyung Hee Univ, Dept Comp Engn, Database Lab, Youngin Si 446701, Kyunggi Do, South Korea
关键词
Data mining; High utility patterns; Incremental mining; Interactive mining; Web access sequences; Web mining; ITEMSET UTILITIES; PATTERNS;
D O I
10.4103/0256-4602.74506
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Mining web access sequences (WASs) can discover very useful knowledge from web logs with broad applications. By considering non-binary occurrences of web pages as internal utilities in WASs, e.g., time spent by each user in a web page, more realistic information can be extracted. However, the existing utility-based approach has many limitations such as considering only forward references of web access sequences, not applicable for incremental mining, suffers in the level-wise candidate generation-and-test methodology, needs several database scans and does not show how to mine web access sequences with different impacts/significances for different web pages. In this paper, we propose a novel framework to solve these problems. Moreover, we propose two new tree structures, called utility-based WAS tree (UWAS-tree) and incremental UWAS-tree (IUWAS-tree) for mining WASs in static and incremental databases, respectively. Our approach can handle both forward and backward references, static and incremental data, avoids the level-wise candidate generation-and-test methodology, does not scan databases several times, and considers both internal and external utilities of a web page. The IUWAS-tree is also applicable for interactive mining. Extensive performance analyses show that our approach is very efficient for both static and incremental mining of high utility WASs.
引用
收藏
页码:3 / 16
页数:14
相关论文
共 21 条
  • [21] Zijian Zheng, 2001, KDD-2001. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P401, DOI 10.1145/502512.502572