Integrating Web content clustering into Web log association rule mining

被引:0
|
作者
Guo, J [1 ]
Keselj, V [1 ]
Gao, Q [1 ]
机构
[1] Dalhousie Univ, Fac Comp Sci, Halifax, NS B3H 1W5, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the effects of the general Internet growth is an immense number of user accesses to WWW resources. These accesses are recorded in the web server log files, which are a rich data resource for finding useful patterns and rules of user browsing behavior, and they caused the rise of technologies for Web usage mining. Current Web usage mining applications rely exclusively on the web server log files. The main hypothesis discussed in this paper is that Web content analysis can be used to improve Web usage mining results. We propose a system that integrates Web page clustering into log file association mining and uses the cluster labels as Web page content indicators. It is demonstrated that novel and interesting association rules can be mined from the combined data source. The rules can be used further in various applications, including Web user profiling and Web site construction. We experiment with several approaches to content clustering, relying on keyword and character n-gram based clustering with different distance measures and parameter settings. Evaluation shows that character n-gram based clustering performs better than word-based clustering in terms of an internal quality measure (about 3 times better). On the other hand, word-based cluster profiles are easier to manually summarize. Furthermore, it is demonstrated that high-quality rules are extracted from the combined dataset.
引用
收藏
页码:182 / 193
页数:12
相关论文
共 50 条
  • [21] Preprocessing and mining web log data for web personalization
    Baglioni, M
    Ferrara, U
    Romei, A
    Ruggieri, S
    Turini, F
    AI(ASTERISK)IA 2003: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2003, 2829 : 237 - 249
  • [22] Exploiting Web log mining for Web cache enhancement
    Nanopoulos, A
    Katsaros, D
    Manolopoulos, Y
    WEBKDD 2001 - MINING WEB LOG DATA ACROSS ALL CUSTOMERS TOUCH POINTS, 2002, 2356 : 68 - 87
  • [23] Web usage log markup language for web mining
    Zhang, Hui
    Song, Hantao
    Punine, John R.
    Journal of Computational Information Systems, 2007, 3 (03): : 971 - 980
  • [24] Web-log mining for predictive Web caching
    Yang, Q
    Zhang, HH
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2003, 15 (04) : 1050 - 1053
  • [25] Integrating web usage and content mining for more effective personalization
    Mobasher, B
    Dai, HH
    Luo, T
    Sun, YQ
    Zhu, J
    ELECTRONIC COMMERCE AND WEB TECHNOLOGIES, PROCEEDINGS, 2000, 1875 : 165 - 176
  • [26] Mining Evolving Web Sessions and Clustering Dynamic Web Documents for Similarity-Aware Web Content Management
    Xiao, Jitian
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2008, 5139 : 99 - 110
  • [27] Personalized web page recommendation using case-based clustering and weighted association rule mining
    Bhavithra, J.
    Saradha, A.
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 3): : S6991 - S7002
  • [28] Personalized web page recommendation using case-based clustering and weighted association rule mining
    J. Bhavithra
    A. Saradha
    Cluster Computing, 2019, 22 : 6991 - 7002
  • [29] Mining sequential association rule for improving web document prediction
    Department of Computer Science and software, Northwestern Polytechnical University, Xi'an 710072, China
    Jisuanji Gongcheng, 2006, 12 (39-41):
  • [30] Bitwise Parallel Association Rule Mining for Web Page Recommendation
    Leung, Carson K.
    Jiang, Fan
    Pazdor, Adam G. M.
    2017 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2017), 2017, : 662 - 669