A Framework for Mining High Utility Web Access Sequences

被引：30

作者：

Ahmed, Chowdhury Farhan ^{[1
]}

Tanbeer, Syed Khairuzzaman ^{[1
]}

Jeong, Byeong-Soo ^{[1
]}

机构：

[1] Kyung Hee Univ, Dept Comp Engn, Database Lab, Youngin Si 446701, Kyunggi Do, South Korea

来源：

IETE TECHNICAL REVIEW | 2011年 / 28卷 / 01期

关键词：

Data mining; High utility patterns; Incremental mining; Interactive mining; Web access sequences; Web mining; ITEMSET UTILITIES; PATTERNS;

D O I：

10.4103/0256-4602.74506

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Mining web access sequences (WASs) can discover very useful knowledge from web logs with broad applications. By considering non-binary occurrences of web pages as internal utilities in WASs, e.g., time spent by each user in a web page, more realistic information can be extracted. However, the existing utility-based approach has many limitations such as considering only forward references of web access sequences, not applicable for incremental mining, suffers in the level-wise candidate generation-and-test methodology, needs several database scans and does not show how to mine web access sequences with different impacts/significances for different web pages. In this paper, we propose a novel framework to solve these problems. Moreover, we propose two new tree structures, called utility-based WAS tree (UWAS-tree) and incremental UWAS-tree (IUWAS-tree) for mining WASs in static and incremental databases, respectively. Our approach can handle both forward and backward references, static and incremental data, avoids the level-wise candidate generation-and-test methodology, does not scan databases several times, and considers both internal and external utilities of a web page. The IUWAS-tree is also applicable for interactive mining. Extensive performance analyses show that our approach is very efficient for both static and incremental mining of high utility WASs.

引用

页码：3 / 16

页数：14

共 21 条

[1] AGRAWAL R, 1995, PROC INT CONF DATA, P3, DOI 10.1109/ICDE.1995.380415
[2] Agrawal R., 1994, P 20 INT C VER LARG, P487, DOI DOI 10.5555/645920.672836
[3] Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases
Ahmed, Chowdhury Farhan
Tanbeer, Syed Khairuzzaman
Jeong, Byeong-Soo
Lee, Young-Koo
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (12) : 1708 - 1721
[4] Ahmed CF, 2009, LECT NOTES ARTIF INT, V5476, P749, DOI 10.1007/978-3-642-01307-2_76
[5] [Anonymous], IBM QUEST SYNTH DAT
[6] Efficient data mining for path traversal patterns
Chen, MS
Park, JS
Yu, PS
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1998, 10 (02) : 209 - 221
[7] Mining web log sequential patterns with position coded pre-order linked WAP-tree
Ezeife, CI
Lu, Y
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2005, 10 (01) : 5 - 38
[8] Fast algorithms for frequent itemset mining using FP-trees
Grahne, G
Zhu, JF
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2005, 17 (10) : 1347 - 1362
[9] Frequent pattern mining: current status and future directions
Han, Jiawei
Cheng, Hong
Xin, Dong
Yan, Xifeng
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2007, 15 (01) : 55 - 86
[10] Mining frequent patterns without candidate generation: A frequent-pattern tree approach
Han, JW
Pei, J
Yin, YW
Mao, RY
[J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2004, 8 (01) : 53 - 87

← 1 2 3 →