Mining and tracking evolving web user trends from large web server logs

被引:0
作者
Hawwash B. [1 ]
Nasraoui O. [1 ]
机构
[1] Knowledge Discovery and Web Mining Laboratory, Department of Computer Engineering and Computer Science, University of Louisville, Louisville
来源
Statistical Analysis and Data Mining | 2010年 / 3卷 / 02期
关键词
Clustering; Concept drifts; Data streams; Evolution; Personalization; User profiles; Web analytics; Web usage mining;
D O I
10.1002/sam.10069
中图分类号
学科分类号
摘要
Recently, online organizations became interested in tracking users' behavior on their websites to better understand and satisfy their needs. In response to this need, web usage mining tools were developed to help them use web logs to discover usage patterns or profiles. However, since website usage logs are being continuously generated, in some cases, amounting to a dynamic data stream, most existing tools are still not able to handle their changing nature or growing size. This paper proposes a scalable framework that is capable of tracking the changing nature of user behavior on a website, and represent it in a set of evolving usage profiles. These profiles can offer the best usage representation of user activity at any given time, and they can be used as an input to higher-level applications such as a web recommendation system. Our specific aim is to make the hierarchical unsupervised niche clustering (HUNC) algorithm more scalable, and to add integrated profile tracking and cluster-based validation to it. Our experiments on real web log data confirm the validity of our approach for large data sets that previously could not be handled in one shot. © 2010 Wiley Periodicals, Inc.
引用
收藏
页码:106 / 125
页数:19
相关论文
共 26 条
[1]  
Cooley R., Mobasher B., Srivastava J., Web mining: information and pattern discovery on the World Wide Web, Proceedings of IEEE International Conference on Tools with AI, pp. 558-567, (1997)
[2]  
Srivastava J., Cooley R., Deshpande M., Tan P.N., Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data, SIGKDD Explorations, 1, 2, pp. 1-12, (2000)
[3]  
Spiliopoulou M., Faulstich L.C., WUM: a web utilization miner, Proceedings of EDBT workshop WebDB98, (1999)
[4]  
Nasraoui O., Krishnapuram R., A New Evolutionary Approach to Web Usage and Context Sensitive Associations Mining, Int J Comput Intell Appl, 2, 3, pp. 339-348, (2002)
[5]  
Nasraoui O., Krishnapuram R., A novel approach to unsupervised robust clustering using genetic niching, Proceedings of the 9th IEEE International Conference on Fuzzy Systems, pp. 170-175, (2000)
[6]  
Catledge L., Pitkow J., Characterizing Browsing Behaviors on the World Wide Web, Comput Netw ISDN SYst, 27, 6, pp. 1065-1073, (1995)
[7]  
Pitkow J., Bharat K., Webviz: a tool for world-wide web access log analysis, First International WWW Conference, (1994)
[8]  
Sarukkai R., Link Prediction Path Analysis using Markov Chains, Comput Netw, 33, 1-6, pp. 377-386, (2000)
[9]  
Nasraoui O., Krishnapuram R., Joshi A., Mining web access logs using a relational clustering algorithm based on a robust estimator, 8th International World Wide Web Conference, pp. 40-41, (1999)
[10]  
Yan T., Jacobsen M., Garcia-Molina H., Dayal U., From user access patterns to dynamic hypertext linking, Proceedings of the 5th International World Wide Web Conference, (1996)