Integrating Web content clustering into Web log association rule mining

被引:0
|
作者
Guo, J [1 ]
Keselj, V [1 ]
Gao, Q [1 ]
机构
[1] Dalhousie Univ, Fac Comp Sci, Halifax, NS B3H 1W5, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the effects of the general Internet growth is an immense number of user accesses to WWW resources. These accesses are recorded in the web server log files, which are a rich data resource for finding useful patterns and rules of user browsing behavior, and they caused the rise of technologies for Web usage mining. Current Web usage mining applications rely exclusively on the web server log files. The main hypothesis discussed in this paper is that Web content analysis can be used to improve Web usage mining results. We propose a system that integrates Web page clustering into log file association mining and uses the cluster labels as Web page content indicators. It is demonstrated that novel and interesting association rules can be mined from the combined data source. The rules can be used further in various applications, including Web user profiling and Web site construction. We experiment with several approaches to content clustering, relying on keyword and character n-gram based clustering with different distance measures and parameter settings. Evaluation shows that character n-gram based clustering performs better than word-based clustering in terms of an internal quality measure (about 3 times better). On the other hand, word-based cluster profiles are easier to manually summarize. Furthermore, it is demonstrated that high-quality rules are extracted from the combined dataset.
引用
收藏
页码:182 / 193
页数:12
相关论文
共 50 条
  • [1] A New Clustering and Preprocessing for Web Log Mining
    Maheswari, B. Uma
    Sumathi, P.
    2014 WORLD CONGRESS ON COMPUTING AND COMMUNICATION TECHNOLOGIES (WCCCT 2014), 2014, : 25 - +
  • [2] Web Clustering and association rule discovery for web broadcast
    Wang, S
    Gao, W
    Li, JT
    Huang, TJ
    Xie, H
    WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2000, 1846 : 227 - 232
  • [3] Frequent pagesets from web log by enhanced weighted association rule mining
    Malarvizhi, S. P.
    Sathiyabhama, B.
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2016, 19 (01): : 269 - 277
  • [4] Frequent pagesets from web log by enhanced weighted association rule mining
    S. P. Malarvizhi
    B. Sathiyabhama
    Cluster Computing, 2016, 19 : 269 - 277
  • [5] Efficient resource utilization of web using data clustering and association rule mining
    Ilampiray, P.
    Journal of Theoretical and Applied Information Technology, 2012, 37 (02) : 211 - 216
  • [6] Web usage association rule mining system
    Dimitrijević M.
    Bošnjak Z.
    Interdisciplinary Journal of Information, Knowledge, and Management, 2011, 6 : 137 - 150
  • [7] Overview: Web log Mining, Privacy Issues and Application of Web Log Mining
    Singh, Amarjeet
    Sreeram, Y. Chaitanya
    2014 INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2014, : 638 - 641
  • [8] A Kind of Improved Data Clustering Algorithm in Web Log Mining
    Guo, Jin
    Zhang, Shengbing
    Qiu, Zheng
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS RESEARCH AND MECHATRONICS ENGINEERING, 2015, 121 : 2115 - 2119
  • [9] Web log mining based on immune network clustering algorithm
    College of Mathematics and Computer Science, Chongqing Normal University, Chongqing 400047, China
    J. Comput. Inf. Syst., 2007, 4 (1549-1554):
  • [10] Web log mining based on improved FCM clustering algorithm
    Wang Zhijun
    Zhou Runjing
    INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND PATTERN RECOGNITION IN INDUSTRIAL ENGINEERING, 2010, 7820