Integrating Web content clustering into Web log association rule mining

被引:0
|
作者
Guo, J [1 ]
Keselj, V [1 ]
Gao, Q [1 ]
机构
[1] Dalhousie Univ, Fac Comp Sci, Halifax, NS B3H 1W5, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the effects of the general Internet growth is an immense number of user accesses to WWW resources. These accesses are recorded in the web server log files, which are a rich data resource for finding useful patterns and rules of user browsing behavior, and they caused the rise of technologies for Web usage mining. Current Web usage mining applications rely exclusively on the web server log files. The main hypothesis discussed in this paper is that Web content analysis can be used to improve Web usage mining results. We propose a system that integrates Web page clustering into log file association mining and uses the cluster labels as Web page content indicators. It is demonstrated that novel and interesting association rules can be mined from the combined data source. The rules can be used further in various applications, including Web user profiling and Web site construction. We experiment with several approaches to content clustering, relying on keyword and character n-gram based clustering with different distance measures and parameter settings. Evaluation shows that character n-gram based clustering performs better than word-based clustering in terms of an internal quality measure (about 3 times better). On the other hand, word-based cluster profiles are easier to manually summarize. Furthermore, it is demonstrated that high-quality rules are extracted from the combined dataset.
引用
收藏
页码:182 / 193
页数:12
相关论文
共 50 条
  • [31] Association Rule Mining for Web Usage Data to Improve Websites
    Singh, Avadh Kishor
    Kumar, Ajeet
    Maurya, Ashish K.
    2014 INTERNATIONAL CONFERENCE ON ADVANCES IN ENGINEERING AND TECHNOLOGY RESEARCH (ICAETR), 2014,
  • [32] Web Data Analysis Using Negative Association Rule Mining
    Kumar, Raghvendra
    Pattnaik, Prasant Kumar
    Sharma, Yogesh
    INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS, VOL 1, INDIA 2016, 2016, 433 : 513 - 518
  • [33] Web log data warehousing and mining for intelligent web caching
    Bonchi, F
    Giannotti, F
    Gozzi, C
    Manco, G
    Nanni, M
    Pedreschi, D
    Renso, C
    Ruggieri, S
    DATA & KNOWLEDGE ENGINEERING, 2001, 39 (02) : 165 - 189
  • [34] An overview of preprocessing of Web log files for Web usage mining
    Department of Computer Science, SDNB Vaishnav College for Women, Chennai, Tamil Nadu, India
    不详
    不详
    J. Theor. Appl. Inf. Technol., 2 (178-185):
  • [35] Web Log Data Analysis and Mining
    Grace, L. K. Joshila
    Maheswari, V.
    Nagamalai, Dhinaharan
    ADVANCED COMPUTING, PT III, 2011, 133 : 459 - 469
  • [36] Integrating Web conceptual modeling and Web usage mining
    Meo, Rosa
    Lanzi, Pier Luca
    Matera, Maristella
    Esposito, Roberto
    ADVANCES IN WEB MINING AND WEB USAGE ANALYSIS, 2006, 3932 : 135 - 148
  • [37] Research and application in web usage mining of the incremental mining technique for association rule
    Zhang, SL
    Shi, ZZ
    INTELLIGENT INFORMATION PROCESSING II, 2005, 163 : 287 - 290
  • [38] Data preparation in web log mining
    Lu, Lina
    Yang, Yiling
    Guan, Xudong
    Wei, Hengyi
    Jisuanji Gongcheng/Computer Engineering, 2000, 26 (04): : 66 - 67
  • [39] Web log data mining analysis
    Lu Ansheng
    2012 INTERNATIONAL CONFERENCE ON INTELLIGENCE SCIENCE AND INFORMATION ENGINEERING, 2012, 20 : 213 - 215
  • [40] An effective system for mining web log
    Yang, ZL
    Wang, YT
    Kitsuregawa, M
    FRONTIERS OF WWW RESEARCH AND DEVELOPMENT - APWEB 2006, PROCEEDINGS, 2006, 3841 : 40 - 52