On the use of constrained associations for web log mining

被引:0
作者
Yang, H [1 ]
Parthasarathy, S [1 ]
机构
[1] Ohio State Univ, Columbus, OH 43235 USA
来源
WEBKDD 2002 - MINING WEB DATA FOR DISCOVERING USAGE PATTERNS AND PROFILES | 2003年 / 2703卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years there has been an increasing interest and a growing body of work in web usage mining as an underlying approach to capture and model the behavior of users on the web for business intelligence and browser performance enhancements. Web usage mining strategies range from strategies such as clustering and collaborative filtering, to accurately modeling sequential pattern navigation. However, many of these approaches suffer problems in terms of scalability and performance (especially online performance) due to the size and sparse nature of the data involved and the fact that many of the methods generate complex models that are less than amenable to an on-line decision making environment. In this paper, we present a new approach for mining web logs. Our approach discovers association rules that are constrained (and ordered) temporally. The approach relies on the simple premise that pages accessed recently have a greater influence on pages that will be accessed in the near future. The approach not only results in better predictions, it also prunes the rule-space significantly, thus enabling faster online prediction. Further refinements based on sequential dominance are also evaluated, and prove to be quite effective. Detailed experimental evaluation shows how the approach is quite effective in capturing a web user's access patterns; consequently, our prediction model not only has good prediction accuracy, but also is more efficient in terms of space and time complexity. The approach is also likely to generalize for e-commerce recommendation systems.
引用
收藏
页码:100 / 118
页数:19
相关论文
共 22 条
[1]  
Agarwal R., 1994, P 20 INT C VER LARG, V487, P499
[2]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[3]  
Agrawal R., 1996, ADV KNOWLEDGE DISCOV
[4]  
AGRAWAL R, 1996, P 5 INT C EXT DAT TE
[5]  
[Anonymous], ACM SIGMOD INT C MAN
[6]  
BADRUL M, 2000, P WEBKDD 2000 WORKSH
[7]  
CHARU C, 1999, P ACM SIGMOD C, P407
[8]  
DESHPANDE M, 2001, P SIAM ITN C DAT MIN
[9]   Strong regularities in World Wide Web surfing [J].
Huberman, BA ;
Pirolli, PLT ;
Pitkow, JE ;
Lukose, RM .
SCIENCE, 1998, 280 (5360) :95-97
[10]  
JOHN S, 1998, P 14 C UNC ART INT