Multiobjective Clustering with Automatic k-determination for Large-scale Data

被引:0
作者
Matake, Nobukazu [1 ]
Hiroyasu, Tomoyuki [2 ]
Miki, Mitsunori [2 ]
Senda, Tomoharu [2 ]
机构
[1] Doshisha Univ, Grad Sch, Kyoto, Japan
[2] Doshisha Univ, Kyoto, Japan
来源
GECCO 2007: GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, VOL 1 AND 2 | 2007年
关键词
Multi-objective optimization; Pattern recognition and classification; Data mining; Speedup technique;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Web mining - data mining for web data - is a key factor of web technologies. Especially, web behavior mining has attracted a great deal of attention recently. Behavior mining involves analyzing the behavior of users, finding patterns of user behavior, and predicting their subsequent, behaviors or interests. Web behavior mining is used in web advertising systems or content recommendation system. To analyze amounts of data, such as web data, data-clustering techniques are usually used. Data clustering is a technique involving the seperation of data into groups according to similarity, and is usually used in the first step of data. mining. lit the present study, we developed a scalable, data-clustering algorithm for web mining based on existent, evolutionary multiobjective clustering algorithm. To derive clusters. we applied multiobjective clustering with automatic k-determination (MOCK). It has been reported that, MOCK, shows better performance than k-means, agglutination methods, and other evolutionary clustering algorithms. MOCK can also find the appropriate number of clusters using the information of the trade-off curve. The k-determination scheme of MOCK is powerful and strict. However the computational costs are too high when applied to clustering huge data. in this paper, we propose a scalable automatic k-determination scheme. The proposed scheme reduces Pareto-size and the appropriate number of clusters can usually be determined.
引用
收藏
页码:861 / +
页数:2
相关论文
共 14 条
[1]  
[Anonymous], P GEN EV COMP C
[2]  
BRANKE J, 2004, P 8 INT C PAR PROBL
[3]  
Handl J, 2004, LECT NOTES COMPUT SC, V3242, P1081
[4]  
Handl J., 2004, Tech. Rep.
[5]  
HANDL J, 2006, PARALLEL PROBLEM SOL
[6]  
HANDL J, 2005, P C EV COMP, V3, P2372
[7]   Data clustering: A review [J].
Jain, AK ;
Murty, MN ;
Flynn, PJ .
ACM COMPUTING SURVEYS, 1999, 31 (03) :264-323
[8]  
Kaufman L., 2009, Finding groups in data: An introduction to cluster analysis
[9]   Genetic algorithm-based clustering technique [J].
Maulik, U ;
Bandyopadhyay, S .
PATTERN RECOGNITION, 2000, 33 (09) :1455-1465
[10]   SHORTEST CONNECTION NETWORKS AND SOME GENERALIZATIONS [J].
PRIM, RC .
BELL SYSTEM TECHNICAL JOURNAL, 1957, 36 (06) :1389-1401