Integrating Web content clustering into Web log association rule mining

被引:0
|
作者
Guo, J [1 ]
Keselj, V [1 ]
Gao, Q [1 ]
机构
[1] Dalhousie Univ, Fac Comp Sci, Halifax, NS B3H 1W5, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the effects of the general Internet growth is an immense number of user accesses to WWW resources. These accesses are recorded in the web server log files, which are a rich data resource for finding useful patterns and rules of user browsing behavior, and they caused the rise of technologies for Web usage mining. Current Web usage mining applications rely exclusively on the web server log files. The main hypothesis discussed in this paper is that Web content analysis can be used to improve Web usage mining results. We propose a system that integrates Web page clustering into log file association mining and uses the cluster labels as Web page content indicators. It is demonstrated that novel and interesting association rules can be mined from the combined data source. The rules can be used further in various applications, including Web user profiling and Web site construction. We experiment with several approaches to content clustering, relying on keyword and character n-gram based clustering with different distance measures and parameter settings. Evaluation shows that character n-gram based clustering performs better than word-based clustering in terms of an internal quality measure (about 3 times better). On the other hand, word-based cluster profiles are easier to manually summarize. Furthermore, it is demonstrated that high-quality rules are extracted from the combined dataset.
引用
收藏
页码:182 / 193
页数:12
相关论文
共 50 条
  • [41] Web mining with relational clustering
    Runkler, T.A. (thomas.runkler@mchp.siemens.de), 1600, Elsevier Inc. (32): : 2 - 3
  • [42] Web mining with relational clustering
    Runkler, TA
    Bezdek, JC
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2003, 32 (2-3) : 217 - 236
  • [43] Clustering for Knowledgeable Web Mining
    Charulatha, B. S.
    Rodrigues, Paul
    Chitralekha, T.
    Rajaraman, Arun
    ARTIFICIAL INTELLIGENCE AND EVOLUTIONARY ALGORITHMS IN ENGINEERING SYSTEMS, VOL 1, 2015, 324 : 491 - 498
  • [44] Improving web sites with web usage mining, web content mining, and semantic analysis
    Norguet, JP
    Zimányi, E
    Steinberger, R
    SOFSEM 2006: THEORY AND PRACTICE OF COMPUTER SCIENCE, PROCEEDINGS, 2006, 3831 : 430 - 439
  • [45] WEB MINING USING K-MEANS CLUSTERING AND LATEST SUBSTRING ASSOCIATION RULE FOR E-COMMERCE
    Chatterjee, Rudra Prasad
    Deb, Kaustuv
    Banerjee, Sonali
    Das, Atanu
    Bag, Rajib
    JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES, 2019, 14 (06): : 28 - 44
  • [46] Association rule retrieved from web log based on rough set theory
    Guo, Sen
    Liang, Yongsheng
    Zhang, Zhili
    Liu, Wei
    FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 3, PROCEEDINGS, 2007, : 129 - +
  • [47] Mining search engine query log for evaluating content and structure of a web site
    Hosseini, Mehdi
    Abolhassani, Hassan
    PROCEEDINGS OF THE IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE: WI 2007, 2007, : 235 - 241
  • [48] Characterizing web user accesses: A transactional approach to web log clustering
    Giannotti, F
    Gozzi, C
    Manco, G
    INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: CODING AND COMPUTING, PROCEEDINGS, 2002, : 312 - 317
  • [49] Web navigation patterns mining based on clustering of paths and pages content
    Gang, F
    Ma, GS
    Jing, H
    ADVANCED WEB AND NETWORK TECHNOLOGIES, AND APPLICATIONS, PROCEEDINGS, 2006, 3842 : 857 - 860
  • [50] WISE: Hierarchical soft clustering of web page search results based on web content mining techniques
    Campos, Ricardo
    Dias, Gaeal
    Nunes, Celia
    2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, (WI 2006 MAIN CONFERENCE PROCEEDINGS), 2006, : 301 - +