Two-level clustering of web sites using self-organizing maps

被引:7
作者
Petrilis, Dimitris [1 ]
Halatsis, Constantin [1 ]
机构
[1] Univ Athens, Dept Informat & Telecommun, GR-15784 Athens, Greece
关键词
access-logs; clustering; content mining; context mining; data mining; neural networks; self-organizing map (SOM); text mining; web-logs; web mining;
D O I
10.1007/s11063-007-9061-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Web sites contain an ever increasing amount of information within their pages. As the amount of information increases so does the complexity of the structure of the web site. Consequently it has become difficult for visitors to find the information relevant to their needs. To overcome this problem various clustering methods have been proposed to cluster data in an effort to help visitors find the relevant information. These clustering methods have typically focused either on the content or the context of the web pages. In this paper we are proposing a method based on Kohonen's self-organizing map (SOM) that utilizes both content and context mining clustering techniques to help visitors identify relevant information quicker. The input of the content mining is the set of web pages of the web site whereas the source of the context mining is the access-logs of the web site. SOM can be used to identify clusters of web sessions with similar context and also clusters of web pages with similar content. It can also provide means of visualizing the outcome of this processing. In this paper we show how this two-level clustering can help visitors identify the relevant information faster. This procedure has been tested to the access-logs and web pages of the Department of Informatics and Telecommunications of the University of Athens.
引用
收藏
页码:85 / 95
页数:11
相关论文
共 13 条
[1]   EVALUATION OF SECONDARY STRUCTURE OF PROTEINS FROM UV CIRCULAR-DICHROISM SPECTRA USING AN UNSUPERVISED LEARNING NEURAL-NETWORK [J].
ANDRADE, MA ;
CHACON, P ;
MERELO, JJ ;
MORAN, F .
PROTEIN ENGINEERING, 1993, 6 (04) :383-390
[2]  
[Anonymous], 1999, MATL DSP C ESP FINL
[3]  
CHEKURI C, 1996, 6 WORLD WID WEB C SA
[4]   Application of artificial aging techniques to samples of rum and comparison with traditionally aged rums by analysis with artificial neural nets [J].
Granados, JQ ;
Guervós, JJM ;
López, MJO ;
Peñalver, JG ;
Herrera, MO ;
Herrera, RB ;
Martinez, MCL .
JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY, 2002, 50 (06) :1470-1477
[5]   Computationally Efficient Approximation of a Probabilistic Model for Document Representation in the WEBSOM Full-Text Analysis Method [J].
S. Kaski .
Neural Processing Letters, 1997, 5 (2) :69-81
[6]  
Kohonen T, 2001, SELF ORG MAPS, DOI [10.1007/978-3-642-56927-2_1, DOI 10.1007/978-3-642-56927-2_1]
[7]   Mining massive document collections by the WEBSOM method [J].
Lagus, K ;
Kaski, S ;
Kohonen, T .
INFORMATION SCIENCES, 2004, 163 (1-3) :135-156
[8]  
Merelo JJ., 2004, P IADIS C WEB BAS CO
[9]  
MOBASHER B, 1999, P 1999 IEEE KNOWL DA, P19, DOI DOI 10.1109/KDEX.1999.836525
[10]  
Romero G, 2003, LECT NOTES COMPUT SC, V2686, P534