Model-Based Clustering and Visualization of Navigation Patterns on a Web Site
被引:1
作者:
Igor Cadez
论文数: 0引用数: 0
h-index: 0
机构:Sparta Inc.,School of Information and Computer Science
Igor Cadez
David Heckerman
论文数: 0引用数: 0
h-index: 0
机构:Sparta Inc.,School of Information and Computer Science
David Heckerman
Christopher Meek
论文数: 0引用数: 0
h-index: 0
机构:Sparta Inc.,School of Information and Computer Science
Christopher Meek
Padhraic Smyth
论文数: 0引用数: 0
h-index: 0
机构:Sparta Inc.,School of Information and Computer Science
Padhraic Smyth
Steven White
论文数: 0引用数: 0
h-index: 0
机构:Sparta Inc.,School of Information and Computer Science
Steven White
机构:
[1] Sparta Inc.,School of Information and Computer Science
[2] Microsoft Research,undefined
[3] University of California,undefined
来源:
Data Mining and Knowledge Discovery
|
2003年
/
7卷
关键词:
model-based clustering;
sequence clustering;
data visualization;
Internet;
web;
D O I:
暂无
中图分类号:
学科分类号:
摘要:
We present a new methodology for exploring and analyzing navigation patterns on a web site. The patterns that can be analyzed consist of sequences of URL categories traversed by users. In our approach, we first partition site users into clusters such that users with similar navigation paths through the site are placed into the same cluster. Then, for each cluster, we display these paths for users within that cluster. The clustering approach we employ is model-based (as opposed to distance-based) and partitions users according to the order in which they request web pages. In particular, we cluster users by learning a mixture of first-order Markov models using the Expectation-Maximization algorithm. The runtime of our algorithm scales linearly with the number of clusters and with the size of the data; and our implementation easily handles hundreds of thousands of user sessions in memory. In the paper, we describe the details of our method and a visualization tool based on it called WebCANVAS. We illustrate the use of our approach on user-traffic data from msnbc.com.