Model-Based Clustering and Visualization of Navigation Patterns on a Web Site

被引:1
作者
Igor Cadez
David Heckerman
Christopher Meek
Padhraic Smyth
Steven White
机构
[1] Sparta Inc.,School of Information and Computer Science
[2] Microsoft Research,undefined
[3] University of California,undefined
来源
Data Mining and Knowledge Discovery | 2003年 / 7卷
关键词
model-based clustering; sequence clustering; data visualization; Internet; web;
D O I
暂无
中图分类号
学科分类号
摘要
We present a new methodology for exploring and analyzing navigation patterns on a web site. The patterns that can be analyzed consist of sequences of URL categories traversed by users. In our approach, we first partition site users into clusters such that users with similar navigation paths through the site are placed into the same cluster. Then, for each cluster, we display these paths for users within that cluster. The clustering approach we employ is model-based (as opposed to distance-based) and partitions users according to the order in which they request web pages. In particular, we cluster users by learning a mixture of first-order Markov models using the Expectation-Maximization algorithm. The runtime of our algorithm scales linearly with the number of clusters and with the size of the data; and our implementation easily handles hundreds of thousands of user sessions in memory. In the paper, we describe the details of our method and a visualization tool based on it called WebCANVAS. We illustrate the use of our approach on user-traffic data from msnbc.com.
引用
收藏
页码:399 / 424
页数:25
相关论文
共 31 条
[1]  
Banfield J.(1993)Model-based Gaussian and non-Gaussian clustering Biometrics 49 803-821
[2]  
Raftery A.(1979)Expected information as expected utility Annals of Statistics 7 686-690
[3]  
Bernardo J.(1998)Efficient data mining for traversal patterns IEEE Transactions on Knowledge and Data Engineering 10 209-221
[4]  
Chen M.-S.(1977)Maximum likelihood from incomplete data via the EM algorithm Journal of the Royal Statistical Society 39 1-38
[5]  
Park J.(1998)How many clusters? Which clustering method? Answers via model-based cluster analysis Computer Journal 41 578-588
[6]  
Yu P.(1997)Strong regularities in World Wide Web surfing Science 280 95-97
[7]  
Dempster A.(1994)Hidden Markov models in computational biology: Applications to protein modeling Journal of Molecular Biology 235 1501-1531
[8]  
Laird N.(1996)Using predictive pre-fetching to improve world wide web latency ACM Computer Communication Review 26 22-36
[9]  
Rubin D.(1999)Distribution of surfer's paths through the world wide web World Wide Web 2 29-45
[10]  
Fraley C.(1990)Mixed Markov and latent Markov modelling applied to brand choice behavior International Journal of Research in Marketing 7 5-19